mdworker Unable to use font: no glyphs present

| No Comments | No TrackBacks

Running Apple Mac OSX, and your system.log is filling up with

mdworker[473]: Unable to use font: no glyphs present.

/System/Library/Frameworks/ApplicationServices.framework /Frameworks/ATS.framework/Support/ATSServer[474]: Serious problems were found in font data while activating it.

/System/Library/Frameworks/ApplicationServices.framework /Frameworks/ATS.framework/Support/ATSServer[474]: You may encounter drawing or printing problems.

Well, it could be Spotlight trying to index a bad PDF file. To find the offending file with use lsof and the process id of the mdworker process

  lsof -p 473

In my case it was a PDF from the hadoop 20.0 release

Cascading and Coroutines

| No Comments | No TrackBacks

Cascading looks quite interesting. Here is a python program that does something similar to the Technical Overview seen main in the python program.

    #!/usr/bin/env python
    # encoding: utf-8
    import sys

    def input(theFile, pipe):
        """
        pushes a file a line at a time to a coroutine pipe
        """
        for line in theFile:
            pipe.send(line)
        pipe.close()

    @coroutine
    def extract(expression, pipe, group = 0):
        """
        extract the group from a regex
        """
        import re
        r = re.compile(expression)
        while True:
            line = (yield)
            match = r.search(line)
            if match:
                pipe.send(match.group(0))

    @coroutine
    def sort(pipe):
        """
        sort the input on a pipe
        """
        import heapq
        heap = []
        try:
            while True:
                line = (yield)
                heapq.heappush(heap, line)
        except GeneratorExit:
            while heap:
                pipe.send(heapq.heappop(heap))

    @coroutine
    def group(groupPipe, pipe):
        """
        sends consectutive matching lines from pipe to groupPipe
        """
        cur = None
        g = None
        while True:
            line = (yield)
            if cur is None:
                g = groupPipe(pipe)
            elif cur != line:
                g.close()
                g = groupPipe(pipe)

            g.send(line)
            cur = line

    @coroutine
    def uniq(pipe):
        """
        implements uniq -c
        """
        lines = 0
        try:
            while True:
                line = (yield)
                lines += 1
        except GeneratorExit:
            pipe.send('%s\t%s' % (lines, line))

    @coroutine
    def output(theFile):
        while True:
            line = (yield)
            theFile.write(line + '\n')

    def main():
        input(sys.stdin,
            extract( r'^([^ ]+)',
                sort(
                    group( uniq,
                        output(sys.stdout)
                    )
                )
            )
        )

    if __name__ == '__main__':
        main()

You can achieve the same results with the unix command line:

cat  access.log | cut -d ' ' -f 1 | sort | uniq -c

Python Coroutines and Twitter

| No Comments | No TrackBacks

Reading http://www.dabeaz.com/coroutines/ and thought this was a natural for a twitter client. Here is a pretty simple version that just prints the public timeline every 60 seconds. Next, up removing the time.sleep and scheduling the followStatus function as a task so I can follow more than one stream at a time.

    #!/usr/bin/env python
    # encoding: utf-8
    import time
    import twitter

    def coroutine(func):
        """
        A decorator function that takes care of starting a coroutine
        automatically on call.

        see: http://www.dabeaz.com/coroutines/
        """
        def start(*args,**kwargs):
            cr = func(*args,**kwargs)
            cr.next()
            return cr
        return start

    @coroutine
    def statusPrinter():
        """
        Just prints twitter status messages to the screen
        """
        while True:
             status = (yield)
             print status.id, status.user.name, status.text

    def followStatus(twitterGetter, target, timeout = 60):
        """
        Follows a twitter status message that takes a since_id
        """
        since_id = None
        while True:
            statuses = twitterGetter(since_id=since_id)
            if statuses:
                # pretty sure these are always in order
                since_id = statuses[0]
                for status in statuses:
                    target.send(status)
            # twitter caches for 60 seconds anyway
            time.sleep(timeout)

    def main():
        api = twitter.Api()
        followStatus(api.GetPublicTimeline, statusPrinter())

    if __name__ == '__main__':
        main()