Programming and politics: 2009

Saturday, December 19, 2009

Music of the Decade

In alphabetical order (more or less) ...

Favorite Songs

Rehab -- Amy Winehouse
The Rising -- Bruce Springsteen
Yellow -- Coldplay
The Scientist -- Coldplay
Clocks -- Coldplay
Stan -- Eminem
Bring Me To Life -- Evanescence
Where'd You Go -- Fort Minor
Crazy -- Gnarls Barkley
Feel Good, Inc. -- Gnarls Barkley
Izzo (H.O.V.A.) -- Jay-Z
D.O.A. (Death of Auto-Tune) -- Jay-Z
Gold Digger -- Kanye West
Gone -- Kanye West
On Call -- Kings of Leon
Dance With My Father -- Luther Vandross
Paper Planes -- MIA
Float On -- Modest Mouse
Hey Ya! -- Outkast
Young Folks -- Peter Bjorn and John
Dani California -- Red Hot Chili Peppers
Killing the Blues -- Robert Plant & Alison Krauss
So Says I -- The Shins
Lazy Eye -- Silversun Pickups
City of Blinding Lights -- U2
Beverly Hills -- Weezer
The Joker & The Thief -- Wolfmother
Seven Nation Army -- The White Stripes
In Da Club -- 50 Cent

Favorite Albums

Neon Bible - Arcade Fire
The Rising -- Bruce Springsteen
Parachutes -- Coldplay
A Rush of Blood to the Head -- Coldplay
Viva La Vida -- Coldplay
The Crane Wife -- The Decemberists
American Idiot -- Green Day
The Black Album -- Jay-Z
The Blueprint -- Jay-Z
Late Registration -- Kanye West
The National -- Boxer
The Slip -- Nine Inch Nails
Year Zero -- Nine Inch Nails
Speakerboxxx/The Love Below -- Outkast
Backspacer -- Pearl Jam
Wolfgang Amadeus Phoenix -- Phoenix
In Rainbows -- Radiohead
Kid A -- Radiohead
Stadium Arcadium -- Red Hot Chili Peppers
Carnavas -- Silversun Pickups
Gimme Fiction -- Spoon
Is This It -- The Strokes
Toxicity -- System of a Down
All That You Can't Leave Behind -- U2
No Line on the Horizon -- U2
Vampire Weekend -- Vampire Weekend
Elephant -- The White Stripes

Tuesday, November 24, 2009

Web Developers Are Stupid and Arrogant

The past couple of days has seen an amusing rant by PPK, that then turned into a retraction and call to action. The original rant included a condemnation of iPhone developers as being stupid and arrogant. Others have adequately refuted PPK, so I won't bother with any of that. His post made me realize that it is in fact web developers who are mostly commonly guilty of stupidity and arrogance. Here's what I mean.

It's easy to look at the world today and say that web applications have won. This is web developer arrogance. Stupidity is to think that web applications have won because web applications are superior to desktop applications. Smarter, but probably still arrogant developers would point to web applications as disruptive technologies. This involves admitting that web applications are inferior, but good enough, and present enough other "cheaper" advantages to compensate for their inferiority.

To understand why the "web apps have won" claim is dubious. There are definitely a lot of awesome web applications out there. Many of them were created back in the mid/late 90's, The "features" of these applications were the key to applications, not the user interface. Now these days, most of these web applications offer APIs/web services/RESTful interfaces/whatever you want to call them. In many cases it is possible to build desktop applications that tap into the same features as these web applications. However, this was certainly not the case 10-15 years ago.

So if APIs make it possible to build desktop apps that offer the same features as popular web applications, why haven't people switched? The first most obvious answer is inertia. If you are used to accessing Google or Amazon on the web, that's probably the way that you will always use it. Something else to consider is that for many web applications, it does not make sense for them to offer all of their features through APIs because it hurts their core business. This is most obviously true for advertising based companies like Google, Yahoo, and Facebook. Their web applications not only provide very useful features to end users, but they serve ads that make money for the companies. If all of their users switched to using desktop applications that only offered the features with no ads, then the companies would lose their revenue streams. Their business is connected at the hip with their UI, so it is in their best interest to make sure people use their UI -- which is a web application.

However, there are other very successful web applications whose main revenue does not come from ads. Their business is distinct from their UI. E-commerce companies like Amazon and my employer, eBay are obvious examples. For example, eBay offers trading APIs that provide almost all of the trading features of eBay. This is particularly true for selling on eBay. This makes sense, as eBay does not need a seller to use the eBay UI to sell something, as the UI is not what makes money for eBay. As a result, around 50% of all items for sale on eBay come through 3rd party applications built on top of the eBay trading APIs. The vast majority of these (especially the popular ones) are desktop applications. Give people a choice, and a lot of people choose desktop applications.

For another interesting example, just look at Twitter. This is a company that came into an existence after web APIs had become the norm. So Twitter has provided a comprehensive set of APIs since early in its existence. Further, they have not pursued an advertising model that would marry their web based UI to any revenue streams. So they have kept their APIs in sync with their features. For example, they recently added list and retweet features to their site, and added them to the APIs at the same time. As a result, there are a huge number of desktop applications for accessing Twitter. Indeed, Twitter says that 80% of their traffic is from APIs -- either desktop or mobile applications. For most Twitter users, there have always been feature-equivalent desktop alternatives to Twitter's web based UI, so many users chose desktop applications over the web.

Finally, let's look at one more example: existing desktop apps. There has been an incredible amount of money spent on creating web applications that provide similar functionality to traditional desktop applications: email, word processing, etc. Heck, Google has spent a lot in this space just by itself. These are useful applications, but it is rare for people to choose these apps over their desktop equivalents. In most cases these apps try to go the disruptive route, i.e. don't try to be as good, but good enough and cheaper. They have had little success so far. Of course, inertia is a valid argument here, too. The one case where there has been success is GMail. In my opinion its success is not because people like it's web UI over a desktop UI, or even that the web UI is "good enough" and cheaper. No, it's success is because it has offered innovative features over other web and desktop based alternatives: fast search, stars/labels, threaded conversations, etc. Even give all of that, many people still choose to use desktop clients to access their GMail (I'm definitely not one of them.) Anyways, once again it's the features, not the UI.

I am not going to sit here and claim that desktop wins over web all the time. I'm not arrogant or stupid enough to make such claims. However, be wary of claims that web apps win over desktop apps. One could argue that with the preponderance of APIs (especially spurned on by mobile apps) and the popping of the advertising based web 2.0 bubble, that the future will hold even more opportunities for desktop alternatives to web applications. Maybe web applications have jumped the shark. So don't put up with web developers who insist that web applications have won (especially if they try to extrapolate this flawed argument to the mobile world). They can go on and on about technology, standards, interoperability, etc. Just remind them that it's the users who matter, and when given a choice, the users do not always choose web applications. Time to polish off your MFC and Cocoa skills!

Saturday, November 21, 2009

Passing on the Duby

Last week I had a fun conversation with Charles Nutter. It was about "java.next" and in particular, Charlie's requirements for a true replacement for Java. He stated his requirements slightly before that conversation, so let me recap:

1.) Pluggable compiler
2.) No runtime library
3.) Statically typed, but with inference
4.) Clean syntax
5.) Closures

I love this list. It is a short list, but just imagine if Java 7 was even close to this. Anyways, while I like the list as a whole, I strongly objected to #2 -- No runtime library. Now, I can certainly understand Charlie's objection to a runtime library. It is a lot of baggage to drag around. All of the JVM langs have this issue. JRuby for example, has an 8 MB jar. Groovy has a 4 MB jar. Clojure has two jars totaling 4.5 MB. Scala is also around 4 MB. These become a real issue if you want to use any of these languages for something like Android development. I've written about this issue as well.

However, there is a major problem with this requirement. The building blocks of any programming language includes primitives and built-in types. Those built-in types are part of the runtime library of the language. So if your JVM based language cannot have a runtime library, then you will have to make do with primitives and Java's built-in types. Why is this so bad? Java's types (mostly) make sense in the context of Java, its design principles and syntax. I don't think they would make sense in the java.next hypothetical language described above. The easiest example of this are collection classes. Don't you want a list that can make use of closures (#5 requirement above) for things like map, reduce, filter, etc. ? Similarly, if you static typing, you probably have generics, and wouldn't you like some of your collections to be covariant? You can go even further and start talking immutability and persistent data structures, and all of the benefits these things bring to concurrent programming. This is just collections (though obviously collections are quite fundamental to any language,) but similar arguments apply to things like IO, threading, XML processing, even graphics (I'd like my buttons to work with closures for event handling thank you very much.)

One argument against this is that you can just include the runtime library of your choice. Pick your own list, hash map, thread pool, and file handler implementations. This is what I'd like to call the C++ solution -- only worse. At least in C++ you can generally count on the STL being available. The thing that is so bad about this is that it really limits the higher-order libraries that can be built. Java, Ruby, and Python all have a plethora of higher order libraries that have been built and widely used. These libraries make extensive use of the built-in types of those languages. Imagine building ORMs like Hibernate or ActiveRecord if you did not have built-in types (especially collection classes) that were consistent with the language. You could not do it. If all you could rely on was the Java built-in types, then at best your libraries would have a very Java-ish feel to them, and doesn't that defeat the purpose?

Charlie gave an alternative to this -- leverage the compiler plugins. Now it is certainly true that with a pluggable compiler, you could do this like add a map method to the java.util.List interface, and all of the JDK classes that implement this interface. It can be done. However, if you are going to build higher order libraries on top of this language, then you need to be able to count on these enhancements being present. In other words, the compiler plugin needs to be a required part of the compiler. Fine. Now what is it that we were trying to avoid? Extra runtime baggage, right? Well if you have a compiler that is going to enhance a large number of the classes in the JDK, hello baggage. Maybe it won't be as much baggage as 4 MB runtime library, but it might not be a lot less either. Also, it raises the potential problem of interoperability with existing Java libraries. If I import a Hibernate jar that gives me a java.util.List, that list is not going to be enhanced by my compiler -- because it wasn't compiled with my javac.next, it was compiled with javac.old. What's going to happen here?

Now there is an obvious happy way to deal with the runtime library question: rt.jar. If java.next was Java, and the super revamped collections, IO, threading, etc. classes were part of the JDK, then there is no need for a runtime library since now it has been included with the JDK. However, given the incredibly uncertain state of Java 7, not to mention the long term prospects of Java, does this seem remotely possible to anyone? I don't think so. I think the heir apparent to Java cannot be Java, and I think because of that, it's gotta have a runtime library of its own.

Monday, November 16, 2009

Cross Browser Geolocation

You might have heard that Joe Hewitt has given Apple the finger, and is moving on (or more accurately moving back to) mobile web development. You can do a lot on mobile web browsers -- more than you can do on desktop browsers. This can seem counterintuitive at first. However, flavors of WebKit have a huge share of mobile web use, and it supports a lot of features that you can't count on in the IE-dominated world of desktop browsers. One of those coveted features is geolocation. However, geolocation support is far from standardized even among the high end WebKit based browsers.

Geolocation on Mobile Safari is nice, or more accurately it is "standardized." It follows the W3C standard. All you have to do is access the navigator.geolocation object. Here is a super simple example:

var gps = navigator.geolocation;
    if (gps){
        gps.getCurrentPosition(successHandler);
    } 
}

function successHandler(position){
  var lat = position.coords.latitude;
  var long = position.coords.longitude;
  doSomethingUseful(lat, long);
}

Pretty easy, huh? This will work on Mobile Safari running on iPhone OS 3+. As mentioned, it is standard. It will also work on newer versions of Firefox, which presumably includes Mobile Firefox, a.k.a. Fennec. However, that is the limit of the portability of this code. It will not work in Android's browser.
Android's flavor of WebKit, does not support the standard. However, it does support geolocation via Google Gears. That's right, instead of supporting the open standard, it implements the same feature using a proprietary technology. Shocking, I know. So if you are going to run JS that should access geolocation on both the iPhone and one of the zillion Android phones out there, you will need to include the gears_init.js bootstrap file. Then you can re-write the above like this:

function load(){
    var gps = navigator.geolocation;
    if (gps){
        gps.getCurrentPosition(successHandler);
    } else {
        if (window.google && google.gears){
            gps = google.gears.factory.create('beta.geolocation');
            gps.getCurrentPosition(function (position){
               successHandler({coords : position});
            }) ;
        }
    }
}

function successHandler(position){
  var lat = position.coords.latitude;
  var long = position.coords.longitude;
  doSomethingUseful(lat, long);
}

So this is pretty close to the standard, but not quite. The Gears variant also supports the getCurrentPosition API. However, the object that it passes to the success function is different. It is a flatter structure. So in the above example, we wrap it inside another object so that we can reuse the same handler.
I haven't tried the webOS browser, yet or the Nokia WebKit variant yet. Here's hoping they are close to the standard used by Safari.

Tuesday, November 10, 2009

Clojures Primes Shootout

Back in May, I was working on my JavaOne talk on JVM language performance. The first comparison was a prime number sieve. If you look back at the slides, you might notice that I did not include a Clojure implementation. There were a couple of reasons for this. First, I was dumb. Not only dumb, but especially dumb when it came to Clojure. I had recently learned it -- mostly on an airplane to Israel earlier that month. Second, realizing that I was dumb, I was at least smart enough to reach out to the Clojure community for help. I got some great help on the other algorithms, but the prime sieve was unsatisfactory. The implementations I got were just too different from the algorithms used for the other languages, that it did not seem like a good comparison. I have been meaning to go back and come up with a satisfactory implementation. And before you ask, I know about this blazingly fast one:

(defn lazy-primes []
  (letfn [(enqueue [sieve n step]
            (let [m (+ n step)]
              (if (sieve m)
                (recur sieve m step)
                (assoc sieve m step))))
          (next-sieve [sieve candidate]
            (if-let [step (sieve candidate)]
              (-> sieve
                (dissoc candidate)
                (enqueue candidate step))
              (enqueue sieve candidate (+ candidate candidate))))
          (next-primes [sieve candidate]
            (if (sieve candidate)
              (recur (next-sieve sieve candidate) (+ candidate 2))
              (cons candidate
                (lazy-seq (next-primes (next-sieve sieve candidate)
                            (+ candidate 2))))))]
    (cons 2 (lazy-seq (next-primes {} 3)))))

and this one that is very close to what I wanted:

(defn sieve [n]
  (let [n (int n)]
    "Returns a list of all primes from 2 to n"
    (let [root (int (Math/round (Math/floor (Math/sqrt n))))]
      (loop [i (int 3)
             a (int-array n)
             result (list 2)]
        (if (>= i n)
          (reverse result)
          (recur (+ i (int 2))
                 (if (< i root)
                   (loop [arr a
                          inc (+ i i)
                          j (* i i)]
                     (if (>= j n)
                       arr
                       (recur (do (aset arr j (int 1)) arr)
                              inc
                              (+ j inc))))
                   a)
                 (if (zero? (aget a i))
                   (conj result i)
                   result)))))))

and of course the one from contrib:

(defvar primes
  (concat 
   [2 3 5 7]
   (lazy-seq
    (let [primes-from
   (fn primes-from [n [f & r]]
     (if (some #(zero? (rem n %))
        (take-while #(<= (* % %) n) primes))
       (recur (+ n f) r)
       (lazy-seq (cons n (primes-from (+ n f) r)))))
   wheel (cycle [2 4 2 4 6 2 6 4 2 4 6 6 2 6  4  2
   6 4 6 8 4 2 4 2 4 8 6 4 6 2  4  6
   2 6 6 4 2 4 6 2 6 4 2 4 2 10 2 10])]
      (primes-from 11 wheel))))
  "Lazy sequence of all the prime numbers.")

Instead I came up with my own bad one...

(defn primes [max-num](
  loop [numbers (range 2 max-num) primes [] p 2]
    (if (= (count numbers) 1)
      (conj primes (first numbers))
      (recur (filter #(not= (mod % p) 0) numbers) (conj primes p) (first (rest numbers))))))

Why is this so bad? Well it is horribly inefficient. It creates a list of integers up to the max-num that is the only input to the function. It then loops, each time removing all multiples of each prime -- just like a sieve is supposed to. However, most sieves take a prime p and remove 2p, 3p, 4p, etc. (often optimized to 3p, 5p, 7p, since all the even numbers are removed at the beginning.) This does the same thing, but it uses a filter to do it. So it goes through each member of the list to check if it is divisible by p. So it does way too many calculations. Next, it is constantly generating a new list of numbers to pass back into the loop. Maybe Clojure does some cleverness to make this not as memory inefficient as it sounds, but I don't think so.
So what is it that I did not like about the other many examples out there or suggested by the community? Most of them used a lazy sequence. That's great and very efficient. However, it's not a concept that is easy to implement in other, less functional languages (Go Clojure!) On the other hand, the Rich Hickey example is much closer to what I wanted to do. However, it uses Java arrays. There is no duplication of the list and the non-primes are eliminated efficiently. Using Java arrays just seems non-idiomatic to say the least. The other examples did for JavaOne all used variable length lists instead of arrays.
Anyways, I am satisfied to at least have a seemingly equivalent, idiomatic Clojure implementation. I looked at optimizing it through using type hints. I actually had to use a lot of type hints (4) to get a 10% decrease in speed. Anyways, not that this is out here, others will improve it, as that shouldn't be too hard to do!

Tuesday, November 03, 2009

Persistent Data Structures

A few weeks ago I blogged about concurrency patterns and their relative tradeoffs. There were some really poor comments from Clojure fans -- the kind of abusive language and trolling that reminded me of Slashdot in its heyday. In fact, for the first time in a very long time, I had to actually delete comments... A lot of people get confused by the fact that I used Clojure as a representative for software transactional memory, and thought that I was talking about concurrency in Clojure. Anyways, I was flattered that Stuart Halloway commented on my blog. He said I needed to watch Rich Hickey's talk on persistent data structures. So I did. I wanted to embed it on my blog, but it didn't look like InfoQ supports that. So I stole his slides instead:

Persistent Data Structures And Managed References

View more documents from michael.galpin.

A little after all of this, I learned that persistent data structures were coming to Scala. So a lot of folks seem to think that these are pretty important. So why is that exactly? It was time to do some homework...

You can read some basic info on Wikipedia. You can go to the original source, Phil Bagwell's paper. The latter is very important. What's amazing is that these are relatively new ideas, i.e. the science behind this is less than ten years old. Anyways, I would say that the important thing about persistent data structures is that they are data structures designed for versioning. You never delete or for that matter change anything in place, you simply create new versions. Thus it is always possible to "rollback" to a previous version. Now you can see how this is a key ingredient for software transactional memory.

Going back to the original ideas... Bagwell's VLists are the foundation for persistent arrays and thus persistent hashtables in Clojure. These form the foundation of Clojure's Multi-Version Concurrency Control, which in turn is the basis of its STM.

These are not just persistent data structures, they are a very efficient implementation of the persistent data structure concept. Each incremental version of a list shares common data with the previous version. Of course you still pay a price for getting the built-in versioning, but this is minimized to some degree.

Scala fans should also be interested in persistent data structures. They are coming to Scala, which is supposed to be available in beta by the end of this month. The aforementioned Bagwell did his work at EPFL -- the same EPFL that is the epicenter of Scala. Indeed, Bagwell himself has contributed to improving the implementations of VLists in Scala. With VLists in tow, Scala STM is just around the corner.

Tuesday, October 27, 2009

Minimalism? Programming? Huh?

Last week, I was at the Strange Loop conference, and it was a great conference. It inspired one blog post that managed to piss some Clojure people off. So here is another. This time I'm going after the closing keynote, Alex Payne. Here are his slides, including detailed notes.

Strange Loop 2009 Keynote: Minimalism in Computing

View more presentations from Alex Payne.

So where to begin... Even though I am going to harsh on this, I will give it very high marks. Why? Because it was thought provoking. If something makes you think long and hard about how and why you do your job, that is a good thing. I worry about people who do not question their assumptions on a regular basis.
Ok, so now on to what you've been waiting for -- the harshness. I hate software engineering analogies. Well, not all, but most. I hate when people try to romanticize about what they do by making a far-fetched analogy to something more glamorous. If you're a programmer, you're not a rock star. You're not an artist. You're not a musician. You're not even a scientist. Sorry. Despite having the word "architect" in my official job title, I would add that you're not an architect either.

You're a programmer. At best, you're an engineer, and that is really stretching it at times. So obviously if I find it ridiculous to compare programming to things like creating art or making music, it seems even more ridiculous to compare the output of programming to art or music.

That being said, I enjoy programming. It is not just a job. I also take pride in my work. I like to show off my code as much as the next guy.

From this perspective, I can definitely appreciate Alex's thoughts on minimalism. Or perhaps more generally, associating some subjective qualities with programming. If an analogy to art or construction or whatever helps one express those subjective qualities, then fine. Just don't forget who you are, and maybe even try to find some pride in just that.

Now I should really end this post now, on kind of a feel good note. But I won't. Instead, I want to touch on some of the *ahem* deeper thoughts that Alex's talk provoked. I was reminded of a paper I read several years ago that tried to explain two obviously overly general (and probably not politically correct) things: why Americans are good at software but not cars, while the Japanese are good at cars, but not software. I tried to find a link to this paper, and was quite sure that I had saved it in Delicious. However I could not find, and I'm convinced that Delicious lost it. Maybe it fell out of cache...

Anyways, the crux of the argument is that Americans are good at getting something done quickly, even though the quality may be poor. So we are good innovators, but terrible craftsmen. This is a good fit for software, but obviously not cars.

If you accept this idea, and it has its merits, then in a world of low quality, rapidly written code, is there any room for subjective qualities? Why should you take any pride in your work at all if its value is directly proportional to how quickly you can produce it, throw it away, and move on to something else? To some degree, doesn't the software world's affection with agile development codify these almost nihilistic ideals? Perhaps the propensity to try and compare programming to art is simply an act of desperation caused by the realization that you'd be better off giving up on good design and instead you should invest in duct-tape.

Anyways, this is a close-to-home topic for me. My day job often involves telling developers the "right" way to do something, and the more unsavory flip-side of this, telling developers when they've done things the "wrong" way. You don't make a lot of friends, but I'm an INTJ so it works for me. Every once in awhile, I work with a programmer who writes beautiful code. Yeah, I said it. Any such programmer knows that they write good code, even if they don't like to talk about it. When you write good code, you constantly see bad code around you, and you can't help but get a big ego. I always compliment the good code, and it always surprises the arrogant pricks. They think I'm blown away and have never seen such good code. In reality, I just feel sorry for them.

Sunday, October 25, 2009

Social Technology Fail

This is the kind of posting that needs a disclaimer. I'm going to talk a little about recent changes at Facebook and Twitter, but strictly from a technology perspective. It goes without saying that I have no idea what I'm talking about. I am fortunate enough to be acquaintances with several engineers at both companies, and I have a college classmate (and fellow Pageboy) who seems to be a pretty important dude at Facebook, but I have no extra knowledge of these companies' technology than anybody else. So just to repeat: I have no idea what I'm talking about. You should really stop reading.

Since you are still reading, then I will assume that you too enjoy being an armchair architect. Since my day job is as an architect at eBay, I tell myself that exercises such as this make me better at my job. Heh heh. Let's start with Facebook.

For several months now, I've noticed an interesting phenomenon at Facebook. My news feed would often have big gaps in it. I have about 200 friends on Facebook, and I'd say that around 70% of these friends are active, and probably 20-25% are very active Facebook users. So at any time I could look at my feed, and there would be dozens of posts per hour. However, if I scrolled back around 3-4 hours, I would usually find a gap of say 4-6 hours of no posts. The first time I ever noticed this, it was in the morning. So I thought that this gap must have been normal -- people were asleep. Indeed, most of my friends are in the United States. However, I started noticing this more and more often, and not always in the morning. It could be the middle of the day or late at night, and I would still see the same thing: big gaps. So what was going on?

Well here's where the "I don't know what I'm talking about" becomes important. Facebook has been very happy to talk about their architecture, so that has given me speculation ammo. It is well known that Facebook has probably the biggest memcached installation in the world, with many terabytes of RAM dedicated to caching. Facebook has written about how they have even used memcached as a way to synchronize databases. It sure sounds a lot like memcached has evolved into something of a write-through cache. When you post something to Facebook, the web application that you interact with only sends your post to the cache.

Now obviously reads are coming from cache, that's usually the primary use case for memcached. Now I don't know if the web app can read from either memcached and a data store (either a MySQL DB, or maybe Cassandra?) or if Facebook has gone for transparency here too, and augmented memcached to have read-through cache semantics as well. Here's where I am going to speculate wildly. If you sent all your writes to a cache, would you ever try to read from anything other than the cache? I mean, it would be nice to only be aware of the cache -- both from a code complexity perspective and from a performance perspective as well. It sure seems like this is the route that Facebook has taken. The problem is that not all of your data can fit in cache, even when your cache is multiple terabytes in size. Even if your cache was highly normalized data (which would be an interesting setup, to say the least) a huge site like Facebook is not going to squeeze all of their data into RAM. So if your "system of record" is something that cannot fit all of your data... inevitably some data will be effectively "lost." News feed gaps anyone?

Maybe this would just be another useless musing -- an oddity that I noticed that maybe few other people would notice, along with a harebrained explanation. However, just this week Facebook got a lot of attention for their latest "redesign" of their home application. Now we have the News Feed vs. the Live Feed. The News Feed is supposed to be the most relevant posts, i.e. incomplete by design. Now again, if your app can only access cache, and you can't store all of your data in cache, what do you do? Try to put the most "relevant" data in cache, i.e. pick the best data to keep in there. Hence the new News Feed. The fact that a lot of users have complained about this isn't that big of a deal. When you have a very popular application, any changes you make are going to upset a lot of people. However, you have to wonder if this time they are making a change not because they think it improves their product and will benefit users overall, but if instead it is a consequence of technology decisions. Insert cart before horse reference here...

Facebook has a great (and well deserved) reputation in the technology world. I'm probably nuts for calling them out. A much easier target for criticism is Twitter. I was lucky enough to be part of their beta for lists. Now lists are a great idea, in my opinion. Lots of people have written about this. However, the implementation has been lacking to say the least. Here is a very typical attempt to use this feature, as seen through the eyes of Firebug:

It took my five attempts to add a user to a list. Like I said, this has been very typical in my experience. I've probably added 100+ users to lists, so I've got the data points to back up my statement. What the hell is going on? Let's look at one of these errors:

Ah, a 503 Service Unavailable response... So it's a temporary problem. In fact look at the response body:

I love the HTML tab in Firebug... So this is the classic fail whale response. However, I'm only getting this on list requests. Well, at the very least I'm only consistently getting this on list requests. If the main Twitter site was giving users the fail whale at an 80% clip... In this case, I can't say exactly what is going. I could try to make something up (experiments with non-relational database?)
However, this is much more disturbing to me than what's going on at Facebook. I don't get how you can release a feature, even in beta, that is this buggy. Since its release, Twitter has reported a jump in errors. I will speculate and say that this is related to lists. It would not be surprising for a feature having this many errors to spill over and affect other features. If your app server is taking 6-10 seconds to send back (error) responses, then your app server is going to be able to handle a lot less requests overall. So not only is this feature buggy, but maybe it is making the whole site buggier.
Now, I know what we (eBay) would do if this was happening: We'd wire-off the feature, i.e. disable it until we had fixed what was going wrong. Twitter on the other hand...

Huh? You've got a very buggy feature, so you're going to roll it out to more users? This just boggles my mind. I cannot come up with a rationale for something like this. I guess we can assume that Twitter has the problem figured out -- they just haven't been able to release the fix for whatever reason. Even if that was the case, shouldn't you roll out the fix and make sure that it works and nothing else pops up before increasing usage? Like I said, I just can't figure this one out...

Thursday, October 22, 2009

Concurrency Patterns: Java, Scala, and Clojure

Today I was at The Strange Loop conference. The first talk I attended was by Dean Wampler on functional programming in Ruby. Dean brought up the Actor concurrency model and how it could be done in Ruby. Of course I am quite familiar with this model in Scala, though it is copied from Erlang (which copied it from some other source I'm sure.) The next talk I went to was on software transactional memory and how it is implemented in Clojure. I had read about Clojure's STM in Stuart Halloway's book, but I must admit that it didn't completely sink in at the time. I only understood the basic idea (it's like a database transaction, but with retries instead of rollbacks!) and the syntax. As I've become more comfortable with Clojure in general, the idea has made more and more sense to me.

Now of course, no one size fits all. There are advantages and drawbacks to any concurrency model. A picture of these pros and cons kind of formed in my head today during the STM talk. I turned them into pictures. Here is the first one:

This is meant to be a measure of how easy it is to write correct concurrent code in Java, Scala, and Clojure. It is also meant to measure how easy it is to undersand somebody else's code. I think these two things are highly correlated. I chose to use these various languages, though obviously this is somewhat unfair. It wold be more accurate to say "locking/semaphores" instead of Java, the Actor model of Scala, and software transactional memory instead of Clojure -- but you get the point. So what does this graph mean?
Well obviously, I think Java is the most difficult language to write correct code. What may be surprising to some people is that I think Clojure is only a little simpler. To write correct code in Clojure, you have to figure out what things need to be protected by a dosync macro, and make sure those things are declared as refs. I think that would be an easy thing to screw up. It's still easier than Java, where you have to basically figure out the same things, but you must also worry about multiple lock objects, lock sequencing, etc. In Clojure you have to figure out what has to be protected, but you don't have to figure out how to protect it -- the language features take care of that.
So Clojure and Java are similar in difficulty, but what about Scala and the Actor model? I think this is much easier to understand. There are no locks/transactions. The only hard part is making sure that you don't send the same mutable object to different actors. This is somewhat similar to figuring what to protect in Clojure, but it's simpler. You usually use immutable case classes for the messages sent between actors, but these are used all over the place in Scala. It's not some special language feature that is only used for concurrency. Ok, enough about easy to write/understand code, there are other important factors, such as efficiency:

Perhaps this should really be described as memory efficiency. In this case Java and the locking model is the most efficient. There is only copy of anything in such a system, as that master copy is always appropriately protected by locks. Scala, on the other hand, is far less efficient. If you send around messages between actors, they need to be immutable, which means a lot of copies of data. Clojure does some clever things around copies of data, making it more efficient than Scala. Some of this is lost by the overhead of STM, but it still has a definite advantage over Scala. Like in many systems, there is a tradeoff between memory and speed:

Scala is the clear king of speed. Actors are more lightweight than threads, and a shared nothing approach means no locking, so concurrency can be maximized. The Java vs. Clojure speed is not as clear. Under high write contention, Clojure is definitely slower than Java. The more concurrency there is, the more retries that are going on. However, there is no locking and this really makes a big deal if there are a lot more reads than writes, which is a common characteristic of concurrent systems. So I could definitely imagine scenarios where the higher concurrency of Clojure makes it faster than Java. Finally, let's look at reusability.

By reusability, I mean is how reusable (composable) is a piece of concurrent code that you write with each of these languages/paradigms? In the case of Java, it is almost never reusable unless it is completely encapsulated. In other words, if your component/system has any state that it will share with another component, then it will not be reusable. You will have to manually reorder locks, extend synchronization blocks, etc. Clojure is the clear winner in this arena. The absence of locks and automatic optimistic locking really shine here. Scala is a mixed bag. On one hand, the Actor model is very reusable. Any new component can just send the Actor a message. Of course you don't know if it will respond to the message or not, and that is a problem. The bigger problem is the lack of atomicity. If one Actor needs to send messages to two other Actors, there are no easy ways to guarantee correctness.
Back in June, I heard Martin Odersky says that he wants to add STM to Scala. I really think that this will be interesting. While I don't think STM is always the right solution, I think the bigger obstacle for adoption (on the Java platform) is the Clojure language itself. It's a big leap to ask people to give up on objects, and that is exactly what Clojure requires you to do. I think a Scala STM could be very attractive...

Tuesday, October 20, 2009

Don't Dream It's Over

Sometimes you need some 80's music. While you listen to that song, I want you to think about something: the Opera browser. As an OG geek, I used to use Opera -- I even paid for it. It was so much better than IE and that was back in the day when there was just the Mozilla Suite Monster, no Firefox. Sure, there were sites that didn't work well in it, or that actively discriminated against it (including my current employer...) That was ok. It was so much faster than anything else out there, that it didn't matter.

As the world has turned over the years, history has proved Opera right. How fast your browser is does matter. How secure your browser is does matter. There is plenty of room for innovation in the browser space: tabs, download managers, speed dial, magic wand, etc. So many of Opera's ideas have been taken by the Firefoxes, Safaris, and most lately, the Chromes of the world, and touted as innovations -- that were then copied by IE. Meanwhile, what's happened to Opera? It doesn't always pay to be right.

I got news for you, Opera is still kicking butt. There has been a renewed focus on browser technology, and in particular what is collectively known as HTML 5. Apple, Google, Mozilla, and even Microsoft all like to talk about how awesomely they implement the HTML 5 specifications. Turns out they are still way behind Opera. Don't believe me? Take a look at @ppk's HTML 5 browser comparison. Or perhaps you are more visual...

Code (courtesy of a colleague):

<form> 
    <datalist id="mylist"> 
        <option label="Mr" value="Mr"> 
        <option label="Ms" value="Ms"> 
        <option label="Professor"value="Prof"> 
    </datalist> 
    <div class="entry"> 
        <label for="form-1">Name (required) </label> 
        <input id="form-1" name="name" type="text" autofocus required> 
        ← autofocus here </div> 
    <div class="entry"> 
        <label for="form-2">Title</label> 
        <input id="form-2" name="title" list="mylist" type="text"> 
    </div> 
    <div class="entry"> 
        <label for="form-4">Age</label> 
        <input id="form-4" name="age" type="number" min="18" max="25"> 
    </div> 
    <div class="entry"> 
        <label for="form-5">Email (required)</label> 
        <input id="form-5" name="email" type="email" required> 
    </div> 
    <div class="entry"> 
        <label for="form-6">Blogs</label> 
        <input id="form-6" name="url" type="url"> 
    </div> 
    <div class="entry"> 
        <label for="form-7">Date of Birth</label> 
        <input id="form-7" name="dob" type="date"> 
    </div> 
    <div class="entry"> 
        <label for="form-8">Attractiveness </label> 
        <input id="form-8" name="a" type="range" step="0.5" min="1" max="10" value="5"> 
        <output name="result" onforminput="value=a.value">5</output> 
    </div> 
    <div class="button"> 
        <button type=submit>Submit</button> 
    </div> 
</form>

Opera:

Latest Chromium Nightly:

As you can see, Opera does a great job of implementing many of the new markups in HTML 5, while Chrome .... not so much. Big deal, right? You can do all of these things with JavaScript you say. Yeah, but not only will that be a lot of heavy JS, but it will lose semantics. Don't think that matters? Well, not only do those semantics matter to folks who use screen readers, but it also matters to the most important user of the Internet: googlebot.

Anyways, the point is that once again Opera is leading the way, not the folks who talk the loudest about pushing browser technology.

Note: This blog post created with Opera.

Monday, October 19, 2009

HTML 5 Features on Mobile Browsers

Earlier today I looked for some help from the smarty folks that I follow on Twitter:

Both @sophistifunk and my colleague @rragan responded that I should look at @ppk's HTML 5 compatibility table and WebKit comparison table. Now these are great resources, and I already had an appreciation for them. However, even put together, they do not, in my opinion, accurately describe the state of mobile web browser capabilities.

Let me take a step back. First it should be said that I hate web standards. Seriously. As someone who has spent much of his adult life developing web applications, I understand why people like standards. Having to write code that is specific for each browser will drive you insane. I don't want to do that anymore than anyone else. However, standards are ex post facto so to speak. It is great to take things and create standards around them, but it is a losing battle to look to standards for innovation. If we had done that, there would be no such thing as Ajax -- or it could only be implemented using hidden IFrames...

With that in mind, I have never given much faith to HTML 5 as a standards lead revolution. This is one of the flaws in the compatibility/comparison tables. Let me give you a couple of examples. The WebKit comparison table shows that neither Android 1.0 or 1.5 supports geolocation (I know, I know, geolocation is a separate specification, but practically it is lumped in with HTML 5). Anybody with an Android phone knows that this is patently false. The Android browser defaults to Google's mobile home page, which uses the geolocation API to echo your location to you -- if you browser supports geolocation. So did @ppk just get this wrong? No, not really. The catch is that the Android browser is not just WebKit, it is also Gears, which implements the HTML 5 geolocation completely. So anybody using an Android phone (and there's going to be a lot of such folks very soon), has a geolocation-enabled browser. But you would not know this from reading the WebKit table.

Another example is app cache. This is specification meant to enable offline applications. Again the WebKit table says that it is not supported in any Android browser. However, this functionality is once again implemented using Gears. In this case, I don't think it follows the specification exactly. Oh well.

Oh, and one last nitpick... The HTML 5 comparison does not even mention database APIs. This is supported by the latest versions of the browsers in the iPhone and Android (and it has been for 1yr+ in both I think.) Yeah I know, @ppk can only run so many tests, and that will probably be listed in the future...

I'm really not trying to diss @ppk and the data provided on quirksmode. It is tricky to assess mobile browser capabilities. That's why I was trying to be lazy in the first place and hope that somebody had done all of the hard work for me!

Sunday, October 18, 2009

The IntelliJ IDEA Bomb

In case you missed the announcement from JetBrains yesterday, the popular IntelliJ IDEA has gone free and open source. Sort of. There is now a community edition, that is FOSS, and an enterprise edition that you must pay for. Why is this a big deal? Read on.

IntelliJ really revolutionized Java development. It wasn't the first IDE to allow for code completion, or even the first Java IDE to do this. However, it was definitely a pioneer in code refactoring. And this was huge. It took many of the code improvement ideas catalogued by Martin Fowler, and turned them into simple commands that any programmer could use. This also allowed for things like code navigation, where you could go from the usage of a class or a method to the implementation (or at least the declaration, if the usage only referred to the interface.) This was only the beginning. IntelliJ was a pioneer of bringing in the Java ecosystem. I remember how much easier IntelliJ made it to use Struts, for example.

This reminds me of my own personal use of IntelliJ over the years. When I first started working in enterprise Java development, I was working for a startup that built EJB applications targeted for WebLogic and WebSphere. For WebSphere, we used Visual Age for Java. For WebLogic, we just used Kawa (a bare bones editor.) I hated Visual Age with a passion. It scarred me badly. For years afterwards, I stuck to simple text editors.

Several year later (2003), I joined a fresh startup, KeepMedia (later renamed to MyWire). Prior to KeepMedia, I had a short stint with a .NET startup, Iteration Software (later renamed to Istante, and sold to Oracle.) There I used Visual Studio quite a bit, and appreciated the productivity gains it provided. So at KeepMedia, I took a look at various Java IDEs: JBuilder, JDeveloper, and IntelliJ. I was one of three programmers at KeepMedia, and one of my colleagues was really in love with Struts. He worked on the front of KeepMedia, but I worked mostly on our back end system that integrated magazine content from publishers. I didn't have to use Struts for that obviously, but I did occasionally help with the front end (we had three programmers after all.) I hated Struts, but IntelliJ made it more tolerable. The only negative about it was that it was flaky on Linux, but pretty much everything was flaky on Linux back then.

After I worked at KeepMedia, I worked at a consulting company, LavaStorm Engineering (2004). I had been using IntelliJ for a long time, and was convinced that everything else was crap. One of my colleagues introduced me to Eclipse. I was horrified. Most of Eclipse's codebase came from an all-Java rewrite of Visual Age (the version I had used was written in Smalltalk.) So even though it shared no code with the beast that I had hated, it shared a lot of the look-and-feel. For example, take a look at the outline view in Eclipse (any version, even the most recent.) This was completely taken from Visual Age. Even the icons are exactly the same. There was no way I was going to give up my IntelliJ for that thing...

A couple of years later, I was at another startup, Sharefare (which later changed its name to Ludi Labs.) Being a startup, there was no standardization arounds tools. However, everyone used Eclipse. I had warmed up to Eclipse (though I still preferred IntelliJ), and I had started writing Eclipse related articles for IBM. So I went with Eclipse there, as there were advantages to everyone using the same IDE (being able to share .project/.classpath, plugins, etc.)

After Ludi Labs went down in early 2007, I joined eBay. To say we're a major Eclipse shop, would be putting it mildly...

Eclipse @eBay 2009

View more presentations from michael.galpin.

We're not the only major Java shop that has invested heavily in Eclipse plugins. If you're doing GWT, App Engine, or Android development, then you're probably using Eclipse too. I know I am. I didn't really start using IntelliJ again, until the past year. The reason was simple: Scala. It is the best IDE for doing Scala development currently. Actually IntelliJ's Scala offering vs. the competition, is similar to anything else. It's not that it necessarily has a lot more features, but it has similar and most importantly, it is of higher quality. Here's a great visualization of this key difference:

This is why the open sourcing of IntelliJ is important. It's not that we didn't already have great open source IDEs for Java. No, it's more about what does this mean for IntelliJ. Will the attention to detail go down hill? Will the stable plugin ecosystem be disrupted?

Of course, for most current and would-be users of IntelliJ, the availability of a free IntelliJ is really not big news. The free version is really quite limited. You're not going to build web apps, or apps that connect to databases, use web services, etc. Not with the community version, unless in-the-wild plugins became available that enable this. IntelliJ has awesome support for all of these things, but those parts of IntelliJ remain behind a price wall.

There is one notable exception here: Scala. Right now, IntelliJ has the best Scala support. It has more features and less bugs than NetBeans, and it is much more stable than Eclipse. Scala support is one of the features supported in the community edition. That means a lot of newbie Scala developers can just use IntelliJ. This is great news. I would not give NetBeans or Eclipse to Scala newbies, for the simple reason that both of them report syntax errors inconsistently. In other words, both are guilty of either not identifying an error, or identifying a false error (or both.) That's anathema for somebody learning a language. I'm able to put up with it, simply because I know just enough Scala to say "you're wrong IDE" at times. I've rarely seen this happen with IntelliJ.

Tuesday, October 13, 2009

Scala Manifests FTW

I came across a piece of code written by a colleague. It was a flexible XML/JSON parser. It would turn an XML or JSON structure into a map. The keys were strings. The values were either strings, lists, or maps. The lists could be lists of strings, lists, or maps. The maps had strings as keys and value as (wait for it) strings, lists, or maps. We had run across a bug recently. Usually a particular web service returned data that looked something like:

{ "details" : { "a" : "x", "b" : "y" } }

So we had code that looked like :

val response = // code that called the parser
val foo = response("details").asInstanceOf[Map[String,String]]("a")

However, one day we got some bad data:

{ "details" : "" }

So of course the earlier code blew up. I wanted to have something like this:

trait SafeMapTrait {
    def getString(key:String):String
    def getList(key:String):List[AnyRef]
    def getMap(key:String):Map[String, AnyRef]
}

Now this can be accomplished pretty easily:

class EasySafeMap(val map:Map[String, AnyRef]){
  def getString(key:String):String = {
    if (map.contains(key)){
      if (map(key).isInstanceOf[String]) map(key).asInstanceOf[String] else null
    } else null
  }
  // etc.    
}

There would be similar methods for lists and maps. I didn't like this, and thought I should be able to do better. Looking at the final solution, I'm not sure that I did. But I did learn some things about Scala Manifests... Before we get there, let's look at me first naive attempt to do better:

class NotSoSafeMap(val map:Map[String,AnyRef]){

  def getString(key:String):String = getType(key)
  def getList(key:String):List[AnyRef] = getType(key)
  def getMap(key:String):Map[String,AnyRef] = getType(key)

  private def getType[T](key:String):T  = {
    val value = map.getOrElse(key, null)
    if (value != null && value.isInstanceOf[T]) value.asInstanceOf[T] else null
  }

}

That would have been, huh? I really wanted to use a parameterized method for the extraction, comparison, casting. The problem with this is that there is no way to know the type T. You could explicitly add the parameter, i.e. getType[String](key) but it doesn't help because of erasure. I tried this instead:

class NotSoSafeMap(val map:Map[String,AnyRef]){

  def getString(key:String):String = getType(key,null)
  def getList(key:String):List[AnyRef] = getType(key,null)
  def getMap(key:String):Map[String,AnyRef] = getType(key,null)

  private def getType[T](key:String, default:T):T  = {
    val value = map.getOrElse(key, default)
    if (value.isInstanceOf[T]) value.asInstanceOf[T] else default
  }

}

I thought that this might be better because of the type information being given in the default value. This didn't work. Using the null default seemed dumb, but even adding defaults like the empty string, an empty list, etc. did not help. Erasure was once again kicking my ass. So it was time to learn about Manifests.
I had heard Jorge Ortiz talk about manifests previously. He has also written an excellent blog post about them. He told me that these were still "experimental" (i.e. undocumented) in Scala 2.7.x, but were officially part of the upcoming 2.8 release. Sounded good to me. Here is the solution I came up with:

class SafeMap(val map:Map[String,AnyRef]){
  import scala.reflect.Manifest

  def getString(key:String):String = getType[String](key) match {
    case Some(s:String) => s
    case _ => null
  }

  def getMap(key:String):Map[String, AnyRef] = getType[Map[String,AnyRef]](key) match {
    case Some(m:Map[String, AnyRef]) => m
    case _ => null
  }

  def getList(key:String):List[AnyRef] = getType[List[AnyRef]](key) match {
    case Some(list:List[AnyRef]) => list
    case _ => null
  }

  private def getType[T](key:String)(implicit m:Manifest[T]):Option[T] = {
    map.getOrElse(key, null) match {
      case a:AnyRef => if (m >:>  Manifest.classType(a.getClass)) Some(a.asInstanceOf[T]) else None
      case null => None
    }
  }
}

Ok, a few things to note here. First the local import of scala.reflect.Manifest. Again it's not a documented class, but it's in there. Now my getType method. Notice that it uses the function_name (param:type) (param:type) syntax. Also notice the implicit Manifest parameter. The callers don't add this, the compiler adds it for you. Next notice that it returns an Option class. I wanted it to just return T. However, I could not have a case where it returned null if T was the declared return type of the method. So I went with Option. Finally, notice the Manifest magic. That's the m >:> Manifest.classType(a.getClass). The right hand side of the call uses a factory method in the Manifest singleton object, to create a Manifest for the (class of the) value coming back from the map. The >:> operator checks to see if the right hand side represents a subclass of the left hand side. This is important. For the getMap method, the manifest will represent the Map trait (actually a Java interface in this case.) The call to a.getClass gives you the runtime class of a. Of course this runtime class implements the Map trait, but you can't do equality comparison. Hence the >:> operator. One last thing, notice that the getString method uses the explicit getType[String]. You would think that the compiler could infer this since the left hand is explicitly declared as a String. It doesn't. When I tried it without the explicit type parameter, my manifest would always Manifest[Nothing].

Saturday, October 10, 2009

The End of The Aughties

There are 72 days left in the Aughties, y'know the current decade: 2000 - 2009. I was looking back at the decade, and what are its most important events. Here's my little list:

9/11 -- This is obvious. September 11, 2001 is clearly one of the most pivotal days in the history of the United States. In the previous century, there are probably only a couple of comparable events: the bombing of Pearl Harbor, V-E day, the moon landing, the JFK assassination. For several generations of Americans, 9/11 will be the most historical day of their life.

The Election of Barack Obama -- President Obama's election was historical in so many ways. Obviously it was historic that an African American was elected President. It also marked a transition to a new generation -- Obama is 15 years younger than Bush or Clinton (and let's not even mention McCain.) Obama is not only a Democrat, but is not from the more conservative, Southern Democrats of Clinton and Jimmy Carter.

Social Media -- Here's where maybe my perspective is skewed by living in Silicon Valley. Social media is not a single event, in fact it is a progression of events. To me, it really started with blogging and YouTube, and then exploded with MySpace, Facebook, and Twitter. It is a fundamental change in the Internet. Every user is a creator of content, as well as a consumer. It is the great democratizing effect of the Internet, and it is only getting started. Even now we are starting to see how businesses, celebrities, etc. realize that not only can they use social media as a channel to customers and fans, but that it is a two-way channel.

Hurricane Katrina -- What made Hurricane Katrina so pivotal is that opened the eyes of Americans. It made people realize that many of their fellow Americans live in awful conditions. The divide between socioeconomic classes in America were never so obvious as during Katrina. When Kanye West went on TV and said that George Bush didn't care about black people, he wasn't just being a jackass, he was stating a sentiment shared by a lot of people.

The iPhone -- What did I say earlier about having a Silicon Valley perspective? Anyways... The iPhone has completely changed so many things for so many people. In the 90's, The Internet changed people's lives by bringing them information. Now the iPhone lets them carry it around in their pocket. Other phones were certainly moving in that direction, but the iPhone broke through by combining a large display with highly usable touch based interface. This revolution continued with the release of the App Store. Now don't get me wrong. A lot of other phones are following suit -- but that's exactly why the iPhone was so historical.

That's my short list. I know it's obviously biased from me being American and living in Silicon Valley. What did I miss? What doesn't belong?

Thursday, October 08, 2009

San Francisco Giants 2009

The Giants had a pretty good season. They were in contention for a playoff spot until the final couple of weeks. But they didn't make it. Folks around here are busy talking about what the Giants need to do to take the next step. The answer is obvious all at once: they need better hitting. However the problem is that the Giants don't actually know how to evaluate hitting talent. So they really don't have any chance at getting good hitting without paying a huge price for it. Let's take a look at some numbers to understand this.

One easy way to evaluate the Giants is by looking at the players who have come up through their farm system. Of their homegrown players that had at least 200 plate appearances, on average they saw 3.62 pitches per plate appearance and walked in 7.2% of their plate appearances. To be fair, Fred Lewis has very good plate discipline (and his plate appearances went way down this year.) If you take him out, then the numbers are 3.52 pitches per plate appearance and a 6.8% walk rate. For comparison's sake, if you look at the top seven teams (in terms of runs scored), they average a walk in 9.8% of their plate appearances.

So the Giants farm system sucks. Maybe they can sign good hitters? Nope. If you look at their top hitters that they signed from other teams, they see 3.58 pitches per plate appearance and a 5.9% walk rate. That's right, they see a few more pitches, but they walk even less. The Giants even made a "big" trade at midseason, for Freddy Sanchez. Everyone is concerned that Freddy might have become injury prone suddenly. What they should really be worrying about is that Freddy sucks. He sees 3.81 pitches per plate appearance, which is not too bad. However, he only walks 4.5% of the time. Further, his 0.417 career slugging percentage is just awful, even for a second baseman (which is generally a strong offensive position in modern baseball.) So he doesn't swing at everything (he also has a low strikeout rate) but yet he still does not try to get a good pitch to hit and winds up playing for a single. This is your blockbuster trade material? This is what you get in exchange for the second best pitcher in your farm system?

Yeah, so clearly the Giants front office has no idea what makes a good hitter. People like to say that the Giants home park is a great pitchers park, as it is a tough place to hit home runs. That does not make it a tough place to take a bad pitch or take a walk. In fact, you would think that in such a park, they would an even higher premium on hitters who can get on base any way they can. It's funny, you often hear that the reason that the A's are no longer a good offensive team is because other teams figured out what they were doing. If you look at the Yankees and Red Sox, or for that matter the Rays and the Rockies, you can see evidence of this. However, clearly there is one team that has not figured things out and that is the Giants.

Tuesday, September 15, 2009

Scala Word Clouds

Just before I went to bed last night, I got an interesting tweet from Dave Briccetti. You can see from the link in tweet what the problem was, and if you look at the bottom of the page, you can see the clever solution by Jorge Ortiz. Right after I read the tweet from Dave, I asked my wife how long until the end of the TV that she was watching. She said seven minutes. I decided that wasn't enough time, so I went to sleep.

I didn't look at the problem again until today during lunch. By then, I saw Jorge's solution and I saw no way to improve on it. However, this inspired a little experiment for me. Actually it mostly reminded me of a programming problem that I was given a few years back by a well known company that will remain anonymous as they are very touchy about people talking about their interviewing process. The problem was essentially the same as Dave's problem -- take a bunch of sentences (instead of tweets), figure out the most frequent words. There was a twist though -- I was not allowed to use any Java collection. Actually I could not use anything outside of the java.lang package, so I was pretty much stuck with arrays...

At the time, I figured that I would have to either implement my own hash table and a sort, but then I came up with a way to solve the problem with no hash table. So when I saw Dave's problem, I thought "wow my old solution will be so much more awesome in Scala":

def createCloud(strings:Seq[String])={
  val x:List[String] = Nil
  var s = ("",0)
  val cloud = new scala.collection.mutable.ArrayBuffer[Tuple2[String,Int]]
  strings.map(_.split("\\s").toList.sort(_ < _)).foldLeft(x){mergeSorted(_, _)}.foreach( (a) => {
    if (a == s._1) 
      cloud(cloud.length - 1) = (a, s._2 + 1)
    else
      cloud += (a,1)
    s = cloud(cloud.length - 1)
  })
  cloud.toList.sort((a,b) => a._2 > b._2)
}

Ok, so I cheated this time and used an ArrayBuffer instead of array. In my solution years ago, I made an array whose size was the total number of words (worst case scenario.) I didn't have tuples, so I had to use two arrays. Anyways, the idea here is to split each sentence into a list of words, and then sort that list of words. Now merge together each sorted list, and keep things sorted using a merge sort. Of course you could turn everything into a big list first using flatMap, similar to Jorge's solution, and then sort. Anyways, you then roll up the sorted list, taking advantage of the fact that it is sorted. You never have to look up a word to see if it has been encountered before, because of the sorting. You just keep track of the last word encountered, and just compare the next word to the last word to determine if the word has been previously encountered and thus needs to be incremented. Of course you must be wondering about the aforementioned merge sort. Here is a Scala implementation of it:

def mergeSorted[A](comparator: (A,A) => Boolean)(x:List[A], y:List[A]):List[A] = {
  if (x.isEmpty) y
  else if (y.isEmpty) x
  else if (comparator(x.head, y.head)) x.head :: mergeSorted(comparator)(x.tail, y)
  else y.head :: mergeSorted(comparator)(x, y.tail)
}
  
def mergeSorted[A <% Ordered[A]](x:List[A], y:List[A]):List[A] = mergeSorted((a:A,b:A) => a < b)(x,y)

While writing this, I decided that I should touch up my knowledge of currying and partially applied functions in Scala, so I read about this on Daniel Spiewak's blog. The first version of mergeSorted is very generic and takes a comparator. This is what lead me into the world of currying. Actually, my first version of the above looked like this:

class MergeSorter[A](comparator: (A,A) => Boolean){
  def apply(x:List[A], y:List[A]):List[A] = {
    if (x.isEmpty) y
    else if (y.isEmpty) x
    else if (comparator(x.head, y.head)) x.head :: apply(x.tail, y)
    else y.head :: apply(x, y.tail)    
  }
}

I created a class to encapsulate the sort. In some ways I like this syntax a little better, but it made the second version of the sort a little uglier. I wanted a "default" sort that could be used for anything implementing Scala's Ordered trait. It is this second version that I used in the createCloud function above, as strings can be implicitly converted to RichString which implements Ordered.

It is still easy to use mergeSorted with objects that do not implement Ordered, or to provide a different sort function for those that do:

// provide your own sort
val sList = 
  mergeSorted((s1:String, s2:String) => s1.toLowerCase < s2.toLowerCase)(List("aa","aB","Cb"), List("Ab", "ca"))
// create a reusable sort
val listSort = mergeSorted( (a:List[Any], b:List[Any]) => a.length < b.length) _
val z = listSort(List(List("a","b"), List(1,2,3)), List(List(true,true,true), List("x","y","z","w")))

In the first example, we create a new sort by providing different sorting function that is case insensitive. It is then applied to a list of strings. In the second example, we create the sorting function and assign to a val. Notice the mysterious underscore _ at the end of the statement. This creates a partially applied function. If you forget the underscore, the Scala compiler actually provides a pretty useful error message about this. With this, we can then apply the function however we want.

Sunday, September 13, 2009

Some Tasty Scala Sugar

I have finally been getting around to working some more on Scala+Android. A lot needs to be done, especially since I am talking about it next month at the Silicon Valley Code Camp. Scala advocates (and I guess that includes me) often say that is very easy for application developers to use Scala, i.e. that "most" developers are never exposed to the more complex aspects of the language. The folks who "suffer" are APIs developers. I've mostly fallen into the first camp (app developers), but now that I'm trying to create a Scala API on top of the Android SDK, I get exposed to more aspects of Scala. Here are a couple of things that I found interesting.

Metaprogramming in Groovy and Scala

This morning I read an excellent pair of articles by Scott Davis on metaprogramming in Groovy. It reminded me of some of the different ways to approach this kind of problem. The second article in particular detailed a new feature in Groovy 1.6, the delegate annotation. Here is an example of using this feature:


public class Monkey {
 def eatBananas(){
   println("gobble gulp")
 }
}

public class RoboMonkey{
 @Delegate final Monkey monkey
 public RoboMonkey(Monkey m){
   this.monkey = m
 }
 public RoboMonkey(){
   this.monkey = new Monkey()
 }
 def crushCars(){
   println("smash")
 }
}

This allows for the following usage:


def monkey = new RoboMonkey()
monkey.eatBananas()
monkey.crushCars()

What is interesting here is that you cannot just create a Monkey and call crushCars on it. RoboMonkey is meant to be a way to add functionality to the Monkey class (in this case the crushCars method), while still being able to treat the RoboMonkey as if it was a Monkey. I point this out because of how this would be done in Scala:


class Monkey{
  def eatBanaas = println("gobble gulp")
}

class RoboMonkey(val monkey:Monkey){
  def this() = this(new Monkey)
  def crushCars = println("smash")
}

object Monkey{
  implicit def makeRobo(monkey:Monkey):RoboMonkey = new RoboMonkey(monkey)
  implicit def makeNormal(robo:RoboMonkey):Monkey = robo.monkey
}

Now arguably this is more complicated, because you have to create the Monkey object in addition to the Monkey class. What Scala requires is that you get those implicit functions (makeRobo, makeNormal) in scope. This is just one way to do that. The added benefit is the following usage:


 val monkey = new Monkey
 monkey.eatBanaas
 monkey.crushCars

I guess the Groovy way to do this is to use the meta class:


Monkey.metaClass.crushCars =  {-> println("smash")}
def monkey = new Monkey()    
monkey.eatBananas()
monkey.crushCars()

In both languages you can accomplish similar things, though with very different ways to implement it. Meta programming in Groovy is very powerful, and there are things you can do in Groovy that you cannot do in Scala -- see methodMissing and propertyMissing. Of course you lose some of the benefits of static typing, but software is all about tradeoffs.

Sunday, August 30, 2009

To Suggest or Not To Suggest

Ajax has been one of the most hyped technologies in recent memory. However, much of the hype is deserved. It really has changed the way we (web developers) build applications and the expectation of users. One of the archetypes of Ajax is the auto-suggest text box. I don't know who first came up with, but I first remember seeing it on Google. I think it was once a "special" version of the Google home page (Google Suggest?), but not it is standard:

It makes for a natural option on search text boxes. Here it is on some of the other search engines out there. Yahoo:

And on the search hotness, Bing:

Of course pure search engines are hardly the only sites that have search and thus have auto-suggest text boxes. It's pretty useful for ecommerce sites too...

As you can see, some sites have gotten creative with their suggestions. Here is another great example of that from Apple:

All of these examples are for search text boxes. If what you want is not suggested, you can still type it and the application will perform a search on it. A little more interesting example is on Facebook:

Here there is a "closed list" behind the suggest box: your friends. However, it is still a search box. If you type something that is not in the list, it will still perform a search that will return results. Of course, really all of the suggest boxes have a closed list behind them as well, but that list is probably much bigger than your list of friends on Facebook (unless you are Robert Scoble.) So the theme continues: there is a finite set of predetermined suggestions, but if you type in something not in that set, the application can still process your input.
Recently, I saw a different use for auto-suggest boxes: as a drop-in replacement for select/option boxes (a.k.a. drop-down box or combo box). This is fundamentally different than any of the examples above. It would be like the Facebook example, but with the limitation that your friends were the only valid input into the search box. In fact, Facebook has a scenario that is similar to this: tagging people in photos/videos:

However, even in this case, you can type the name of somebody who is not one of your friends. This is valid input. After all, maybe not everybody that you take pictures of has an account on Facebook. My kids are growing up fast, but my five year old son is not yet on Facebook...
I imagine that this pattern -- using auto-suggest box to replace a select/option box -- is used on websites out there. It seems like a reasonable thing to do to avoid a drop-down with a large number of choices, or a drop-down that is expensive to calculate and is often not used. However, it seems awkward, too. What do you do when a user types in something that is not in the box? I like to type, and I'm both reasonably fast at it and make enough spelling mistakes that this could be a common scenario from me.
Now I am no user experience expert. In fact, as a web developer, I would say that I am perhaps the least qualified person when it comes to judging user experience on the web. My knowledge makes me very forgiving of things that might be confusing to others, while it also makes me critical of things that others would not notice. So I am curious about other's people's experiences and opinions. Are there sites out there that use this pattern? Is it useful or awkward?

Saturday, August 15, 2009

A Tipping Point for Scala

This past week's BASE meeting was all about IDE support for Scala. You can read my notes, posted to the Scala tools mailing list. I was very surprised by this meeting. Not by the findings, if you will, as I have used all three IDEs at various times in the last few months. What I was surprised by was the feedback from the group, and the logical conclusion of this discussion: Scala is near a tipping point, but IDE support is holding it back.

First off, there was a large turnout for the BASE meeting. I would say it was the second largest BASE meeting, only bested by the June meeting where Martin Odersky spoke. It is funny, because I think our esteemed organizer, Dick Wall, had been intending this topic to be like "well if we have nothing else to talk about it, we'll talk about IDEs." If there had been an alternative topic brought up, I don't think people would have objected. After all, developers and their attitude towards IDEs are contradictory. Most developers I know would tell you that IDE support for a language is very important, but they would also act indifferent about IDEs when it came to them personally. It's like "all of those other developers really need IDEs, but I would be ok without them." We all know our APIs so well, that we don't need code completion, right? And we don't write bugs, so a debugger is of limited use, right? However, I am sure that if the meeting had not been about IDEs, then there would have been less people in attendance.

So why so much interest? Like it or not, but Scala's primary audience right now are Java developers. Yes, I know Scala appeals to some dynamic language folks, and to some functional programming folks, and that its .NET implementation is being updated, but you could sum up all of the Scala developers from those disciplines and it would be dwarfed by the Java contingency. Scala has a lot of appeal on its own merits, but it is always going to be framed against Java. Scala's most (only?) likely path to mass appeal is as "the long term replacement for java."

So when you talk about developers choosing to use Scala, you are really talking about Java developers choosing to use Scala instead of Java. This is not the only use case, but not only is it the most common use case, it is arguably the only use case that matters. Without this use case, Scala will at most be a marginal language, a la Haskell, or OCaml, or Groovy for that matter.

Back to my point... Java developers need great IDEs. This is not because they "need" help from their IDE because of some lack of skill. No, it's because they have had great IDEs for a long time now, and thus it has become a requirement. I remember when I joined Ludi Labs (it was still called Sharefare at the time) several years ago, we had a Java programming quiz. Candidates were given a clean install of Eclipse to use for writing their programs. We could have given them Vi or Emacs and a command line, but that would have been asinine and foolish. IDEs are an integral part of Java development.

I knew all of the above before the BASE meeting, but what I did not know was how many development organizations were at a critical juncture when it comes to Scala. For many folks, Scala, the language, has won the arguments. Whatever perceived extra complexity that it has, has been judged as worth it. Whatever challenges there may be in hiring people to develop in Scala can be mitigated. Legacy code is not even a factor, as integration with existing Java code is trivial. Maybe it's bleak future of Java, or maybe it's the high profile use of Scala at Twitter. Who knows, but Scala is poised to take a big piece of the Java pie.

Thus the missing piece is IDE support. Development orgs can't switch to Scala without IDE support, and the support is not there yet. That's the bad news. The good news is that Scala is ready to explode once the IDE support is there. There are a lot of folks out there ready to adopt Scala simply as a "better Java." They just need an IDE that is on par with Java IDEs. That is the standard.

All of that being said, there is a lot of concern around the IDEs. Many people expressed to me that they are worried that IDE progress is being coupled to the release of Scala 2.8. That seems reasonable at first, but what happens if 2.8 is not released until 2010 sometime? Will Scala lose its momentum and window of opportunity?