Jason_Culverhouse_Mug_Shot.jpg It’s become quite popular to give advice in a blog post. Hopefully, you can provide great words of wisdom to help others. However, I’ve noticed that some of these posts have an underlying agenda. First, I’ve seen some meaningless posts used to get “street cred” and attention even though they really don’t say anything interesting. Second, I’ve seen posts whose specific goal is to reshape a “problematic history” in positive light.

Regardless of the reality, one can create a plausible story of the past in a blog. You can neutralize your bad decisions, morally suspect actions, and failures. The trick is to do so in way it looks like you are giving advice to entrepreneurs. As long what you are saying sounds reasonable, you’ll get credit for brilliance. This may sound incredulous — after all, won’t there be people in the peanut gallery that disagree? Yes, always. Fortunately, those that disagree will usually be polite, unless you are a complete dick in your post. More importantly, there will always be people who support your position (unless you are completely off the deep end). Trust me. Further, hopefully you have friends that will support you. Even if you don’t have friends, there will probably be those who will want to ride your coattails and give positive support. There will be enough lemmings willing to jump on board. (Apologies for mixed metaphors.) In the end, you can use your blog for self promotion and covering up deeds in the past without providing any value to your readers. How does this work?

It Doesn’t Matter What You Say

Lorem ipsum dolor sit amet, mi cursus mi libero sollicitudin, aliquet posuere in, et quis eleifend consectetuer id nulla, sit luctus. A porttitor egestas nunc, vestibulum a, congue dui, ac dui proin. Vel ligula et, donec ullamcorper nec aliquam habitant quis. Tempor metus ornare ullamcorper nec mauris, pellentesque molestie aliquam, elit adipiscing et sodales enim, sed ante, ut pretium a justo egestas lorem quam. Risus justo proinsed quis cursus, luctus posuere quis, enim ligula orci vehicula venenatis, lacus lorem eget hac et cras. Ullamcorper risus ante, cursus velit ut ut n ulla volutpat, nullam euismod at fringilla varius tinciduntac, quam lectus aliquet sociosqu vitae nam, quis nec pulvinar quisque. Sed rhoncus mauris, mollis suspendisse eros faucibus erat nec consectetuer.

Use Social Proof

Hopefully, you have some friends in your corner. Or at least people that want to get something from you and your blog post. Get your friends to tweet out and facebook your post. If you have a damaged reputation or just want to increase the credibility of your post, get someone to co-author the post with you. It should be someone you have at least some connection to. It makes you look better. It provides social proof. Also, generously and gratuitously sprinkle in hyperlinks. While authority comes from inbound links coming to your post, it’s easy to fool readers with the opposite. Use outbound links to good posts.

Incredibly, even if you don’t have any friends or people that want to co-author with you, don’t worry. You can just pretend you have a co-author. In this post, I didn’t ask Wayne to write or edit any part of this post. However, I’ll just pretend he did. I didn’t even ask for his permission. No one will know. Hopefully, he’ll even be pleased, since he gets “social proof” by being a co-author. (Though, if your “co-author” does complain, you might have some ‘splaining to do.) Did you know I am co-authoring a blog post with Barack Obama next week?

Format Matters

db.jpg Even though what you say really doesn’t matter, how you say it does. From a semantic perspective, you should say something wise, funny, and seemingly authoritative. You should take the perspective that you are a seasoned veteran that has been “around the block” with battle wounds to show for it. Snarky and ironic can work, but be careful.

More important, the format of your post matters. It should be of medium length. Clearly, it should be longer than a tweet. If anyone comments, tl;dr — you know you’ve written too much. Here are some key elements:

  • A catchy headline
  • A picture of you — a d-baggy one where you are wearing sunglasses is best
  • A picture of your co-author
  • A short bio of both you and and your co-author
  • A bullet list that summarizes seemingly important points — make it easy for your readers to digest something

Distribution

Readers are important, of course. Tweet your post out early and often. Post up on Facebook. The best time is probably Tuesday or Thursday morning, never Monday. Choose to post on a site that values permanently keeping your post online. I’ve seen sites that move guest posts behind a reg gate after a few weeks. Silly, silly, silly. Your post should be the gift that keeps giving.

Last, and I shouldn’t need to explain this — make sure you do your SEO work.

Conclusion

See? It doesn’t matter what you say. Even though I’ve “let the cat is out of the bag,” this technique will continue to be exploited, hopefully for the benefit of all. In particular, if your past is troubled, no worries — fix it up with a good blog. Good luck!

(About the authors: Wayne Yamamoto is the co-founder and CEO of Charity Blossom and co-founder of MerchantCircle. Follow him on Twitter at @kazabyte. Jason Culverhouse co-founded Charity Blossom with Wayne and worked with him at MerchantCircle and BroadVision. He can be found @jsonculverhouse on Twitter and blogs at mischievous.org.

Can we reverse engineer Google’s word correction algorithm given a corpus of misspelled words paired with their corrections?

Since I have a single word domain name mischievous, which is one of the 100 most misspelled English words, this allows me to analyze some interesting data from Google’s webmaster tools. I pulled out all the misspellings and impressions within a Levenshtein Distance. There is a nice academic paper that discusses Learning a Spelling Error Model from Search Query Logs that I plan to use to explore some of this data in the future.

A chart and regression of the misspelling data on a log-log chart shows that impressions of misspellings of the word mischievous vs the rank that they appear in all keywords that lead to this blog follows Zipf’s_law. I refitted words with under 10 impressions based on their rank data (ranks >= 83) as webmaster tools only gives a sample value when the impressions are greater than 10.

mischievous-fvs-r.png

Raw Data

You can use this table to gauge your spelling (I should add the cumulative distribution so you should see what percentile a misspelling places you )

rank query replace levenshtein similarity
1 mischievous 27000.00 0 1.00
2 mischevious 4500.00 2 0.41
3 mischivious 700.00 2 0.50
6 michevious 500.00 3 0.21
7 mischevous 500.00 1 0.64
13 mischiveous 170.00 2 0.50
18 mischieveous 150.00 1 0.67
19 mischivous 150.00 1 0.64
20 michievous 110.00 1 0.64
21 mischeivious 90.00 3 0.39
23 mischeivous 90.00 2 0.50
24 michevous 70.00 2 0.38
25 mischievious 70.00 1 0.67
26 mischeveous 70.00 2 0.41
29 mischeavious 60.00 3 0.39
30 mischiefous 60.00 1 0.60
31 michivious 60.00 3 0.28
32 mischeavous 50.00 2 0.50
33 mishevious 35.00 3 0.28
35 miscevious 35.00 3 0.35
47 mishievous 16.00 1 0.64
48 michievious 16.00 2 0.41
53 misgevious 12.00 4 0.28
54 micheivious 12.00 4 0.20
55 mischvious 12.00 3 0.44
56 mischiveious 12.00 2 0.47
58 mischevios 12.00 3 0.28
83 mischevius 11.15 2 0.35
101 miscevous 8.30 2 0.57
113 micheavous 7.01 3 0.28
133 mischeives 5.48 4 0.28
140 mischeviuos 5.08 3 0.26
153 mischiefious 4.44 2 0.56
176 mischeous 3.60 2 0.47
196 mechivious 3.06 4 0.21
218 miscievious 2.61 2 0.41
223 mechevious 2.52 4 0.15
241 mischieved 2.24 3 0.53
262 myschevious 1.98 3 0.20
263 misjevious 1.96 4 0.28
273 mischeviouse 1.86 3 0.32
277 machivious 1.82 4 0.21
279 mischeiveous 1.80 3 0.39
282 mischives 1.77 3 0.38
321 mischievous? 1.45 1 1.00
324 miscchievous 1.43 1 0.79
333 mischeifous 1.38 3 0.41
334 mistchivious 1.37 3 0.32
351 miscievous 1.27 1 0.64
357 mischieveious 1.24 2 0.63
363 mishcevious 1.21 3 0.26
371 mischievous  1.17 2 1.00
378 mischievous. 1.14 1 1.00
408 micheveous 1.01 3 0.21
422 mischevoius 0.96 2 0.41
430 mistivious 0.94 4 0.28
438 mischievo 0.91 2 0.69
444 misgivious 0.89 4 0.28
483 michivous 0.79 2 0.38
510 mischievous, 0.72 1 1.00
525 mystivious 0.69 5 0.15
528 myschivious 0.69 3 0.26
543 mis chievous 0.66 1 0.67
603 meschivious 0.56 3 0.26
606 mischievoud 0.56 1 0.71
626 mischeviois 0.53 3 0.26
629 micheavious 0.53 4 0.20
635 mishievious 0.52 2 0.41
661 miscivous 0.49 2 0.47
671 meschevious 0.48 3 0.20
676 miss chivous 0.47 3 0.39
734 mischieves 0.42 2 0.53

Digging through my email I realized that I had quite abit of email related to sales of Facebook Class B shares on the secondary market. I dug around the emails that I saved from November 2011 and have eight data points for Facebook sales to date. Here are the results in a handy chart with the obligatory R² for a simple linear regression. I’ll also note that these “shares” are not actually purchases of shares but purchases of an investment vehicle designed to hold shares of Facebook via an indirect interest.

facebook_sales_data.png

Facebook Share Sales Data

Detailed Facebook share transaction dates proces and volume that I have collected:

DatePriceVolume
November 16, 2011$30.0075,000
December 9, 2011$33.00100,000
December 21, 2011$32.00150,000
January 20, 2012$34.0070,000
February 8, 2012$44.00150,000
February 14, 2012$42.00200,000
February 22, 2012$42.00125,000
February 29, 2012$40.00125,000

When you build a website it is often filled with objects that use serial column types, these are usually auto-incrementing integers. Often you want to obscure these numbers since they may convey some business value i.e. number of sales, users reviews etc. A Better idea is to use an actual Natural Key, which exposes the actual domain name of the object vs some numeric identifier. It's not always possible to produce a natural key for every object and when you can't do this, consider obscuring the serial id.

This doesn't secure your numbers that convey business value, it only conceals them from the casual observer. Here is an alternative that uses the bit mixing properties of exclusive or XOR, some compression by conversion into "base36" (via reddit) and some bit shuffling so that the least significant bit is moved which minimizes the serial appearance. You should be able to adapt this code to alternative bit sizes and shuffling patterns with some small changes. Just not that I am using signed integers and it is important to keep the high bit 0 to avoid negative numbers that cannot be converted via the "base36" algorithm.

Twiddling bits in python isn't fun so I used the excellent bitstring module

    from bitstring import Bits, BitArray

    #set the mask to whatever you want, just keep the high bit 0 (or use bitstring's uint)
    XOR_MASK = Bits(int=0x71234567, length=32)

    # base36 the reddit way 
    # https://github.com/reddit/reddit/blob/master/r2/r2/lib/utils/_utils.pyx
    # happens to be easy to convert back to and int using int('foo', 36)
    # int with base conversion is case insensitive
     def to_base(q, alphabet):
        if q < 0: raise ValueError, "must supply a positive integer"
        l = len(alphabet)
        converted = []
        while q != 0:
            q, r = divmod(q, l)
            converted.insert(0, alphabet[r])
        return "".join(converted) or '0'

    def to36(q):
        return to_base(q, '0123456789abcdefghijklmnopqrstuvwxyz')

    def shuffle(ba, start=(1,16,8), end=(16,32,16), reverse=False):
        """
        flip some bits around
        '0x10101010' -> '0x04200808'
        """
        b = BitArray(ba)
        if reverse:
            map(b.reverse, reversed(start), reversed(end))
        else:
            map(b.reverse, start, end)
        return b  

    def encode(num):
        """
        Encodes numbers to strings

        >>> encode(1)
        've4d47'

        >>> encode(2)
        've3b6v'
        """
        return to36((shuffle(BitArray(int=num,length=32)) ^ XOR_MASK).int)

    def decode(q):
        """
        decodes strings to  (case insensitive)

        >>> decode('ve3b6v')
        2

        >>> decode('Ve3b6V')
        2

        """    
        return (shuffle(BitArray(int=int(q,36),length=32) ^ XOR_MASK, reverse=True) ).int

I'm not a real big Google+ user, but I may consider changing my ways. I really like the "You shared this" feature and it integration with the Google's Author Information in Search Results. When you set everything up properly it leads to "effortless sharing" or at least given the latest change to Google's Social Posts in Search Results. If you want to be an influencer in the digiterati it might be time to reevaluate using Google+. These results are also transitive, even if someone isn't directly in your circle, if they are in one of your friend's circles you can still influence their search results and possibly take up one of the bottom results on the first search page.

A sample of what search results with social posts look like given my circle of friends:

This is a SERP for an article that was "shared by me", because I have a Google+ author profile link on my blog pages. I never had to share this article but Google can identify it as "shared by me" jason_culverhouse_author_profile.png

Here are some friends of mine influencing my search results with very generic search terms that they would generally not rank on the first page of results:

Wayne Yamamoto for the search terms "social proof", at the time I took the screen shot Wayne had not shared this via Google+ but he can still pick up the last result in my SERP.

wayne_yamamoto_social_proof.png

Kevin Leu for the search terms "Silicon Valley", Kevin usually shares everything on Google+ and is able to pick up 2 SERPS on the front page for Silicon Valley when I am logged into search.

kevin_leu_silicon_valley.png

If I am in your circle and you repeat these searches, chance are my friends can influence your search results.

Invest in your Google+ profile, it's like a Facebook feed in every google search.

Recently there have been a few articles published on the structure of Airbnb's latest financing round in TechCrunch, Kara Swisher as well as the excellent blog post by Felix Salmon, the premises of the articles focused on the fairness play between insiders that both option holders and share holders. The one thing that really stuck me with was when Chamath Palihapitiya said:

In contrast, if you are viewed as self-dealing and shady, it will only hurt your long term prospects

I see aspects of "self-dealing", the conduct a corporate officer that consists of taking advantage of his position in a transaction and acting for his own interests rather than for the interests of the corporate shareholders, as a recent phenomenon in the Silicon Valley. It's possible that this is exacerbated by the recent lack of IPO opportunities due to the current market conditions leading to more "creative" forms of achieving liquidity.

On May 26, 2011 My former Employer MerchantCircle was acquired by Reply.com for $60 million. At the time of the acquisition, as an early employee, I held around 8% of the unconverted exercised common shares and 1% of the fully converted shares. Investors received a packet detailing the framework of the deal in a 1,200 page deal document (see image) and were give a week to review the documentation. I picked up the paperwork on June 6th and then sent this letter to the board on June 9th. There were some details of the deal that I missed but, the paperwork itself was, in my opinion, purposefully obfuscated. I never received any reply from the Board or the Company about my concerns even thought my questions are neither specious or rhetorical. The thing that stood out the most in the deal was the fact that the former CEO, Ben T. Smith IV:

"will have the opportunity to receive in the Merger an amount of cash per share of his Company Common Stock that is substantially larger than the amount of cash per share than the other holders of Company Common Stock"

The definition of "fair" when related to stock has taken on a whole new meaning in Silicon Valley.

Under the "terms of the deal" investors were allowed to sell up to 37.5% of shares, employees were also allowed to sell 37.5% of total options granted as long as this was not greater than their total vested options. Employees were able to "lever" unvested options to sell vested options. The structure of this deal actually punished the investors that owned their shares since they could only sell 37.5%. Employees potentially got to sell 100% of their vested options as long as 37.5% or less were vested. In this case, the CEO granted himself 2,000,000 shares of common less than 6 months before the deal closed and exploited this clause to cash out as many of his shares as possible. This resulted in a cash dilution to investors and the transfer's major benefactor was Ben T Smith, IV.

A usual rule of thumb is that the more risk you take, the more reward you receive. In this case unvested options were worth as much as an purchased share owned by a non-employee. In this case, zero risk returned outsized reward, especially in the case of the CEO.

Here is my letter to the CEO and the Board in it's entirety:

Subject: Open Letter To the MerchantCircle Board
Date: June 9, 2011 11:35:31 AM PDT
To: Ben Smith IV, Members of the MerchantCircle Board

Board Members,
Let me start by saying that I am happy the deal with Reply.com was "done". For the employees, many of whom are my personal friends, I feel that the terms of the deal are very generous. Almost all of the employees who have worked at the company for more than one year are able to elect to sell 100% of their vested shares at a reasonable valuation. This is an excellent outcome.

The following facts regarding the deal give me concerns:

  • On December 22, 2010 the company decided to grant 2,144,000 shares to Ben Smith, under the terms of the Reply deal these shares allow him to earn an additional $924,520.
  • This grant was so large that it exceeded the amount of ISO options that are allowed to be granted in a single year, the ISO component alone was the single largest grant in value in the history of the company.
  • Some members of the Board, former employees, and myself invested money into the company yet we are only able to receive 37.5% of our common holding in cash. The impact of this transaction was to create an additional class of shares, unable to take advantage of the magical December 22, 2010 grants. The merger documents identify an additional class of shares as the Series C-3 Preferred.
  • The value of former non-founding employees' shares is almost equal the $920,000 that was transferred by the December 22, 2010 grant. Our common share holder value has been leveraged to pay for that grant.
  • The merger documents themselves identify Ben Smith as receiving a "Golden Parachute" and inform me that "there is a presumption that the options granted ... were granted in contemplation of the change of control" .

I am left with these unanswered questions:

  • Is there any concern that the the timeline in this deal may construe "Self-dealing" by a corporate officer?
  • When the Board approved these December 22, 2010 grants, were they aware that the amounts were structured to match the deal?
  • Were grants to Ben Smith abnormally favorable or a "sweetheart deal"?
  • Do these action somehow deny Common shareholders equal status?
  • Why would the investors leave all their money "on the table"?

The Valley is run on reputation. Do we all suffered a loss of reputation by our association with Ben Smith and the terms of this deal in regards to former employee Common shareholders?

I hope that we can work together to resolve these questions and concerns, feel free to contact me.

Sincerely,

Jason Culverhouse

1,200 Pages Of Deal

Removing A Django Application Completely with South

Let's pretend that the application that you want remove contains the following model in myapp/models.py: class SomeModel(models.Model): data = models.TextField

Create the initial migration and apply it to the database:

    ./manage.py schemamigration --initial myapp
    ./manage.py migrate

To remove the Models edit myapp/models.py and remove all the model definitions

Create the deleting migration:

    ./manage.py schemamigration myapp

Edit myapp/migrations/0002autodel_somemodel.py to remove the related content types

    from django.contrib.contenttypes.models import ContentType
    ...
    def forwards(self, orm):

        # Deleting model 'SomeModel'
        db.delete_table('myapp_somemodel')
        for content_type in ContentType.objects.filter(app_label='myapp'):
            content_type.delete()

Migrate the App and remove the table, then fake a zero migration to clean out the south tables

    ./manage.py migrate
    ./manage.py migrate myapp zero --fake

Remove the app from your settings.py and it should now be fully gone....

Django has a built in sitemap generation framework that uses views to build a sitemap on the fly. Sometimes your dataset is too large for this to work in a web application.  Here is a management command that will generate a static sitemap and index for your models.  You can extend it to handle multiple Models.

import os.path 
from django.core.management.base import BaseCommand, CommandError
from django.contrib.sitemaps import GenericSitemap
from django.contrib.sites.models import Site
from django.template import loader 
from django.utils.encoding import smart_str

from myproject.models import MyModel


class Command(BaseCommand):
    help = """Generates the sitemaps for the site, pass in a output directory
    """

    def handle(self, *args, **options):
        if len(args) != 1:
            raise CommandError('You need to specify a output directory')
        directory = args[0]
        if not os.path.isdir(directory):
            raise CommandError('directory %s does not exist' % directory)
        #modify to meet your needs
        sitemap = GenericSitemap({'queryset': MyModel.objects.order_by('id'), 'date_field':'modified' })
        current_site = Site.objects.get_current()

        index_files = []
        paginator = sitemap.paginator
        for page_num in range(1, paginator.num_pages+1):
            filename = 'sitemap_%s.xml' % page_num
            file_path = os.path.join(directory,filename)
            index_files.append("http://%s/%s" % (current_site.domain, filename))
            print "Generating sitemap %s" % file_path
            with open(file_path, 'w') as site_mapfile:
                site_mapfile.write(smart_str(loader.render_to_string('sitemap.xml', {'urlset': sitemap.get_urls(page_num)})))
        sitemap_index = os.path.join(directory,'sitemap_index.xml')
        with open(sitemap_index, 'w') as site_index:
            print "Generating sitemap_index.xml %s" % sitemap_index
            site_index.write(loader.render_to_string('sitemap_index.xml', {'sitemaps': index_files}))

Experimenting with Node.js and MongoDb and Mongoose

I came across Mongoose for Node.js. It looks like a promising project but I ran into a bug as soon as I started playing with a simple counter program. The problem is in the implementation QueryPromise's atomic functions. Here is a sample program that updates a counter. The three update forms below should all be identical, only the first seems to work with the version I was playing with.

// Simple test program to show a problem in QueryPromise
// ['inc','set','unset','push','pushAll','addToSet','pop','pull','pullAll']

var sys = require('sys')
var mongoose = require('mongoose/').Mongoose
var db = mongoose.connect('mongodb://localhost/test');

var Simple = mongoose.noSchema('test',db);
Simple.drop(); 
//should only be one....
var m = new Simple({name:'test', x:0,y:0}).save()
// these should behave the same
Simple.update({name:'test'},{'$inc':{x:1, y:1}}).execute();
Simple.update({name:'test'}).inc({x:1, y:1}).execute();
Simple.update({name:'test'}).inc({x:1}).inc({y:1}).execute();

Simple.find({name:'test'}).each(
     function (doc) {
         sys.puts(JSON.stringify(doc));
     }
).then(
    function(){ // promise (execute after query)
        Simple.close(); // close event loop
    }
);

Here is a fixed version of QueryPromise's atomic functions that place the command and arguments in the correct place.

// atomic similar

  ['inc','set','unset','push','pushAll',
  'addToSet','pop','pull','pullAll'].forEach(function(cmd){
      QueryPromise.prototype[cmd] = function(modifier){
        if(this.op.name.charAt(0) != 'u') return this;
        if(!this.op.args.length) this.op.args.push({},{});
        if(this.op.args.length == 1) this.op.args.push({});
        for(i in modifier) {
          if(!(this.op.args[1]['$'+cmd] instanceof Object)) this.op.args[1]['$'+cmd] = {};
          this.op.args[1]['$'+cmd][i] = modifier[i];
        }
        return this;
      }
  });

A friend just asked how to do city/state lookup on input strings. I've used metaphones and Levenshtein distance in the past but that seems like over kill. Using a n-gram is a nice and easy solution

  1. easy_install ngram

  2. build file with all the city and state names one per line, place in citystate.data Redwood City, CA Redwood, VA etc

  3. Experiment ( the .2 threshold is a little lax )

import string
import ngram
cityStateParser = ngram.NGram(
  items = (line.strip() for line in open('citystate.data')) ,
  N=3, iconv=string.lower, qconv=string.lower,  threshold=.2
)

Example:

cityStateParser.search('redwood')
[('Redwood VA', 0.5),
('Redwood NY', 0.5),
('Redwood MS', 0.5),
('Redwood City CA', 0.36842105263157893),
...
]

Notes: Because these are NGrams you might get overmatch when the state is part of a ngram in the city i.e. search for "washington" would yield Washington IN with a bette score than "Washington OK"

You might also want read Using Superimposed Coding Of N-Gram Lists For Efficient Inexact Matching (PDF Download)

If this works for you, consider giving me a vote on StackOverflow.com

Find recent content on the main index or look in the archives to find all content.