When you build a website it is often filled with objects that use serial column types, these are usually auto-incrementing integers. Often you want to obscure these numbers since they may convey some business value i.e. number of sales, users reviews etc. A Better idea is to use an actual Natural Key, which exposes the actual domain name of the object vs some numeric identifier. It's not always possible to produce a natural key for every object and when you can't do this, consider obscuring the serial id.

This doesn't secure your numbers that convey business value, it only conceals them from the casual observer. Here is an alternative that uses the bit mixing properties of exclusive or XOR, some compression by conversion into "base36" (via reddit) and some bit shuffling so that the least significant bit is moved which minimizes the serial appearance. You should be able to adapt this code to alternative bit sizes and shuffling patterns with some small changes. Just not that I am using signed integers and it is important to keep the high bit 0 to avoid negative numbers that cannot be converted via the "base36" algorithm.

Twiddling bits in python isn't fun so I used the excellent bitstring module

    from bitstring import Bits, BitArray

    #set the mask to whatever you want, just keep the high bit 0 (or use bitstring's uint)
    XOR_MASK = Bits(int=0x71234567, length=32)

    # base36 the reddit way 
    # https://github.com/reddit/reddit/blob/master/r2/r2/lib/utils/_utils.pyx
    # happens to be easy to convert back to and int using int('foo', 36)
    # int with base conversion is case insensitive
     def to_base(q, alphabet):
        if q < 0: raise ValueError, "must supply a positive integer"
        l = len(alphabet)
        converted = []
        while q != 0:
            q, r = divmod(q, l)
            converted.insert(0, alphabet[r])
        return "".join(converted) or '0'

    def to36(q):
        return to_base(q, '0123456789abcdefghijklmnopqrstuvwxyz')

    def shuffle(ba, start=(1,16,8), end=(16,32,16), reverse=False):
        """
        flip some bits around
        '0x10101010' -> '0x04200808'
        """
        b = BitArray(ba)
        if reverse:
            map(b.reverse, reversed(start), reversed(end))
        else:
            map(b.reverse, start, end)
        return b  

    def encode(num):
        """
        Encodes numbers to strings

        >>> encode(1)
        've4d47'

        >>> encode(2)
        've3b6v'
        """
        return to36((shuffle(BitArray(int=num,length=32)) ^ XOR_MASK).int)

    def decode(q):
        """
        decodes strings to  (case insensitive)

        >>> decode('ve3b6v')
        2

        >>> decode('Ve3b6V')
        2

        """    
        return (shuffle(BitArray(int=int(q,36),length=32) ^ XOR_MASK, reverse=True) ).int

I'm not a real big Google+ user, but I may consider changing my ways. I really like the "You shared this" feature and it integration with the Google's Author Information in Search Results. When you set everything up properly it leads to "effortless sharing" or at least given the latest change to Google's Social Posts in Search Results. If you want to be an influencer in the digiterati it might be time to reevaluate using Google+. These results are also transitive, even if someone isn't directly in your circle, if they are in one of your friend's circles you can still influence their search results and possibly take up one of the bottom results on the first search page.

A sample of what search results with social posts look like given my circle of friends:

This is a SERP for an article that was "shared by me", because I have a Google+ author profile link on my blog pages. I never had to share this article but Google can identify it as "shared by me" jason_culverhouse_author_profile.png

Here are some friends of mine influencing my search results with very generic search terms that they would generally not rank on the first page of results:

Wayne Yamamoto for the search terms "social proof", at the time I took the screen shot Wayne had not shared this via Google+ but he can still pick up the last result in my SERP.

wayne_yamamoto_social_proof.png

Kevin Leu for the search terms "Silicon Valley", Kevin usually shares everything on Google+ and is able to pick up 2 SERPS on the front page for Silicon Valley when I am logged into search.

kevin_leu_silicon_valley.png

If I am in your circle and you repeat these searches, chance are my friends can influence your search results.

Invest in your Google+ profile, it's like a Facebook feed in every google search.

Recently there have been a few articles published on the structure of Airbnb's latest financing round in TechCrunch, Kara Swisher as well as the excellent blog post by Felix Salmon, the premises of the articles focused on the fairness play between insiders that both option holders and share holders. The one thing that really stuck me with was when Chamath Palihapitiya said:

In contrast, if you are viewed as self-dealing and shady, it will only hurt your long term prospects

I see aspects of "self-dealing", the conduct a corporate officer that consists of taking advantage of his position in a transaction and acting for his own interests rather than for the interests of the corporate shareholders, as a recent phenomenon in the Silicon Valley. It's possible that this is exacerbated by the recent lack of IPO opportunities due to the current market conditions leading to more "creative" forms of achieving liquidity.

On May 26, 2011 My former Employer MerchantCircle was acquired by Reply.com for $60 million. At the time of the acquisition, as an early employee, I held around 8% of the unconverted exercised common shares and 1% of the fully converted shares. Investors received a packet detailing the framework of the deal in a 1,200 page deal document (see image) and were give a week to review the documentation. I picked up the paperwork on June 6th and then sent this letter to the board on June 9th. There were some details of the deal that I missed but, the paperwork itself was, in my opinion, purposefully obfuscated. I never received any reply from the Board or the Company about my concerns even thought my questions are neither specious or rhetorical. The thing that stood out the most in the deal was the fact that the former CEO, Ben T. Smith IV:

"will have the opportunity to receive in the Merger an amount of cash per share of his Company Common Stock that is substantially larger than the amount of cash per share than the other holders of Company Common Stock"

The definition of "fair" when related to stock has taken on a whole new meaning in Silicon Valley.

Under the "terms of the deal" investors were allowed to sell up to 37.5% of shares, employees were also allowed to sell 37.5% of total options granted as long as this was not greater than their total vested options. Employees were able to "lever" unvested options to sell vested options. The structure of this deal actually punished the investors that owned their shares since they could only sell 37.5%. Employees potentially got to sell 100% of their vested options as long as 37.5% or less were vested. In this case, the CEO granted himself 2,000,000 shares of common less than 6 months before the deal closed and exploited this clause to cash out as many of his shares as possible. This resulted in a cash dilution to investors and the transfer's major benefactor was Ben T Smith, IV.

A usual rule of thumb is that the more risk you take, the more reward you receive. In this case unvested options were worth as much as an purchased share owned by a non-employee. In this case, zero risk returned outsized reward, especially in the case of the CEO.

Here is my letter to the CEO and the Board in it's entirety:

Subject: Open Letter To the MerchantCircle Board
Date: June 9, 2011 11:35:31 AM PDT
To: Ben Smith IV, Members of the MerchantCircle Board

Board Members,
Let me start by saying that I am happy the deal with Reply.com was "done". For the employees, many of whom are my personal friends, I feel that the terms of the deal are very generous. Almost all of the employees who have worked at the company for more than one year are able to elect to sell 100% of their vested shares at a reasonable valuation. This is an excellent outcome.

The following facts regarding the deal give me concerns:

  • On December 22, 2010 the company decided to grant 2,144,000 shares to Ben Smith, under the terms of the Reply deal these shares allow him to earn an additional $924,520.
  • This grant was so large that it exceeded the amount of ISO options that are allowed to be granted in a single year, the ISO component alone was the single largest grant in value in the history of the company.
  • Some members of the Board, former employees, and myself invested money into the company yet we are only able to receive 37.5% of our common holding in cash. The impact of this transaction was to create an additional class of shares, unable to take advantage of the magical December 22, 2010 grants. The merger documents identify an additional class of shares as the Series C-3 Preferred.
  • The value of former non-founding employees' shares is almost equal the $920,000 that was transferred by the December 22, 2010 grant. Our common share holder value has been leveraged to pay for that grant.
  • The merger documents themselves identify Ben Smith as receiving a "Golden Parachute" and inform me that "there is a presumption that the options granted ... were granted in contemplation of the change of control" .

I am left with these unanswered questions:

  • Is there any concern that the the timeline in this deal may construe "Self-dealing" by a corporate officer?
  • When the Board approved these December 22, 2010 grants, were they aware that the amounts were structured to match the deal?
  • Were grants to Ben Smith abnormally favorable or a "sweetheart deal"?
  • Do these action somehow deny Common shareholders equal status?
  • Why would the investors leave all their money "on the table"?

The Valley is run on reputation. Do we all suffered a loss of reputation by our association with Ben Smith and the terms of this deal in regards to former employee Common shareholders?

I hope that we can work together to resolve these questions and concerns, feel free to contact me.

Sincerely,

Jason Culverhouse

1,200 Pages Of Deal

Removing A Django Application Completely with South

Let's pretend that the application that you want remove contains the following model in myapp/models.py: class SomeModel(models.Model): data = models.TextField

Create the initial migration and apply it to the database:

    ./manage.py schemamigration --initial myapp
    ./manage.py migrate

To remove the Models edit myapp/models.py and remove all the model definitions

Create the deleting migration:

    ./manage.py schemamigration myapp

Edit myapp/migrations/0002autodel_somemodel.py to remove the related content types

    from django.contrib.contenttypes.models import ContentType
    ...
    def forwards(self, orm):

        # Deleting model 'SomeModel'
        db.delete_table('myapp_somemodel')
        for content_type in ContentType.objects.filter(app_label='myapp'):
            content_type.delete()

Migrate the App and remove the table, then fake a zero migration to clean out the south tables

    ./manage.py migrate
    ./manage.py migrate myapp zero --fake

Remove the app from your settings.py and it should now be fully gone....

Django has a built in sitemap generation framework that uses views to build a sitemap on the fly. Sometimes your dataset is too large for this to work in a web application.  Here is a management command that will generate a static sitemap and index for your models.  You can extend it to handle multiple Models.

import os.path 
from django.core.management.base import BaseCommand, CommandError
from django.contrib.sitemaps import GenericSitemap
from django.contrib.sites.models import Site
from django.template import loader 
from django.utils.encoding import smart_str

from myproject.models import MyModel


class Command(BaseCommand):
    help = """Generates the sitemaps for the site, pass in a output directory
    """

    def handle(self, *args, **options):
        if len(args) != 1:
            raise CommandError('You need to specify a output directory')
        directory = args[0]
        if not os.path.isdir(directory):
            raise CommandError('directory %s does not exist' % directory)
        #modify to meet your needs
        sitemap = GenericSitemap({'queryset': MyModel.objects.order_by('id'), 'date_field':'modified' })
        current_site = Site.objects.get_current()

        index_files = []
        paginator = sitemap.paginator
        for page_num in range(1, paginator.num_pages+1):
            filename = 'sitemap_%s.xml' % page_num
            file_path = os.path.join(directory,filename)
            index_files.append("http://%s/%s" % (current_site.domain, filename))
            print "Generating sitemap %s" % file_path
            with open(file_path, 'w') as site_mapfile:
                site_mapfile.write(smart_str(loader.render_to_string('sitemap.xml', {'urlset': sitemap.get_urls(page_num)})))
        sitemap_index = os.path.join(directory,'sitemap_index.xml')
        with open(sitemap_index, 'w') as site_index:
            print "Generating sitemap_index.xml %s" % sitemap_index
            site_index.write(loader.render_to_string('sitemap_index.xml', {'sitemaps': index_files}))

Experimenting with Node.js and MongoDb and Mongoose

I came across Mongoose for Node.js. It looks like a promising project but I ran into a bug as soon as I started playing with a simple counter program. The problem is in the implementation QueryPromise's atomic functions. Here is a sample program that updates a counter. The three update forms below should all be identical, only the first seems to work with the version I was playing with.

// Simple test program to show a problem in QueryPromise
// ['inc','set','unset','push','pushAll','addToSet','pop','pull','pullAll']

var sys = require('sys')
var mongoose = require('mongoose/').Mongoose
var db = mongoose.connect('mongodb://localhost/test');

var Simple = mongoose.noSchema('test',db);
Simple.drop(); 
//should only be one....
var m = new Simple({name:'test', x:0,y:0}).save()
// these should behave the same
Simple.update({name:'test'},{'$inc':{x:1, y:1}}).execute();
Simple.update({name:'test'}).inc({x:1, y:1}).execute();
Simple.update({name:'test'}).inc({x:1}).inc({y:1}).execute();

Simple.find({name:'test'}).each(
     function (doc) {
         sys.puts(JSON.stringify(doc));
     }
).then(
    function(){ // promise (execute after query)
        Simple.close(); // close event loop
    }
);

Here is a fixed version of QueryPromise's atomic functions that place the command and arguments in the correct place.

// atomic similar

  ['inc','set','unset','push','pushAll',
  'addToSet','pop','pull','pullAll'].forEach(function(cmd){
      QueryPromise.prototype[cmd] = function(modifier){
        if(this.op.name.charAt(0) != 'u') return this;
        if(!this.op.args.length) this.op.args.push({},{});
        if(this.op.args.length == 1) this.op.args.push({});
        for(i in modifier) {
          if(!(this.op.args[1]['$'+cmd] instanceof Object)) this.op.args[1]['$'+cmd] = {};
          this.op.args[1]['$'+cmd][i] = modifier[i];
        }
        return this;
      }
  });

A friend just asked how to do city/state lookup on input strings. I've used metaphones and Levenshtein distance in the past but that seems like over kill. Using a n-gram is a nice and easy solution

  1. easy_install ngram

  2. build file with all the city and state names one per line, place in citystate.data Redwood City, CA Redwood, VA etc

  3. Experiment ( the .2 threshold is a little lax )

import string
import ngram
cityStateParser = ngram.NGram(
  items = (line.strip() for line in open('citystate.data')) ,
  N=3, iconv=string.lower, qconv=string.lower,  threshold=.2
)

Example:

cityStateParser.search('redwood')
[('Redwood VA', 0.5),
('Redwood NY', 0.5),
('Redwood MS', 0.5),
('Redwood City CA', 0.36842105263157893),
...
]

Notes: Because these are NGrams you might get overmatch when the state is part of a ngram in the city i.e. search for "washington" would yield Washington IN with a bette score than "Washington OK"

You might also want read Using Superimposed Coding Of N-Gram Lists For Efficient Inexact Matching (PDF Download)

If this works for you, consider giving me a vote on StackOverflow.com

I was concerned to see that SQLite was deprecated in movabletype 5.0, but I went ahead and upgraded my blog. I followed the standard procedure, copy the new version over the old version then run the mt-upgrade.cgi via the browser. The upgrade script never made it to migrating the database. When this happened I just used the "Upgrade a large database" instructions

    $ export MT_HOME=/var/local/mv
$ cd $MT_HOME
$ perl  ./tools/upgrade --name superuser
upgrade -- A command line tool for upgrading the schema for Movable Type.
    * Upgrading database from version 4.0070.
    * Upgrading table for Website records...
    * Upgrading table for MT::Entry::Summary records...
    * Upgrading table for entry_rev records...
    * Upgrading table for Entry records...
    * Upgrading table for Asset Placement records...
    * Upgrading table for Session records...
    * Upgrading table for MT::Author::Summary records...
    * Upgrading table for User records...
    * Upgrading table for template_rev records...
    * Upgrading table for Template records...
    * Upgrading table for Permission records...
    * Upgrading table for Comment records...
    * Rebuilding permissions...
    * Rebuilding permissions... (100%)
    * Updating existing role name...
    * Populating new role for website...
    * Migrating mtview.php to MT5 style...
    * Assigning new system privilege for system administrator...
    * Assigning to  jason...
    * Updating existing role name...
    * Populating new role for theme...
    * Upgrading Asset path informations...
    * Classifying blogs...
    * Classifying blogs... (100%)
    * Merging dashboard settings...
    * Merging dashboard settings... (100%)
    * Migrating existing 1 blog into websites and its children...
    * Generated a website http://mischievous.org/
    * Moved blog Pseudointellectual Appendification (http://www.mischievous.org/) under website mischievous.org
    * Creating new template: 'Comment Listing'.
    * Database has been upgraded to version 5.0016.
Upgrade complete!

My school sent home a set of 3 CD's with circa Glencoe StudentWorks software from 2007. This software consists of a flash application that launches Adobe Acrobat version 7 to display PDF the PDF content of the California Mathematics Grade 4 workbooks. Children are expected to complete homework assignments using this software. My primary operating system is Mac OS X v10.6 Snow Leopard, the application "Student Works OSX" is a PowerPC application and would require Rosetta to be installed to run.

Thankfully, the PDFs are on the CD and you do not need to run the application to see the content, just navigate to /Volumes/CA Math 4/support/PDF/docs and you will find the PDF for each individual chapter.
On Mac or Windows, there is no need to install or run any application from the disc, If you are running Windows, download the latest Acrobat, don't install the antiquated version on the disc.

Printing is another matter, you can't. The chapters are password print protected. I'm not a lawyer but I read the standard shrink wrap licensing agreement that came with the software and here is what I found:

COPIES. Copies can be made only as authorized above in machine readable form. Print copies of Software code are not authorized. All copyright and trademark notices must remain on all copies. All copies must be faithful reproductions. You are solely responsible for the content, quality and operation of all Software copies. Certain Software programs may be "copy protected" by special encryption coding that prevents copying or printing-out content

And

You may also make one (1) back- up copy of the Software for archival purpose

Great, I choose to backup the PDF portions of the "software" by printing on paper. Here is my "backup" program, you just need GhostScript from macports.


    #!/bin/bash -x
    for i in "$@"; do
        NAME=`basename "$i"`
        gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile="$NAME" -c .setpdfwrite -f "$i"
        lpd -d "SCX_4500" "$NAME"
        rm $NAME
    done

If you live in San Mateo County you can use your Peninsula Library System Card to access Safari Books Online.

If you manage to find link on the plsinfo.org website you will find that the proxy url is incorrect.
Here is the correct link Safari Books Online, you need your library card.

Read away.

You can also see a list of all the resources you can access via your library card.

Find recent content on the main index or look in the archives to find all content.

Recent Assets

  • jason_culverhouse_author_profile.png
  • kevin_leu_silicon_valley.png
  • wayne_yamamoto_social_proof.png
  • google_plus.png
  • github_icon.png
  • facebook.png
  • Jason Culverhouse Young.png
  • big_deal.jpg