## Compute the product of a column of data using SQL

In SQL, summing a column of numbers is relatively trivial, and the size of the dataset isn’t important. For example:

SELECT
SUM(X) as SUMOFX
FROM data_table;


Mathematically, what we are doing here is:

$$SUMOFX = \sum_{i=1}^n X_i$$

Now, if we find ourselves in the unfortunate situation to compute this:

$$PRODUCTOFX = \prod_{i=1}^n X_i$$

In this case, it won’t be as easy to use the SQL code above, but we can use Euler’s number and a property of the natural logarithm to also make this task a trivial one. The equation that we are going to exploit is:

$$ln(x*y) = ln(x) + ln(y)$$

which gives us a functional relationship between the product and sum of two numbers. This allows us to change our above SQL code to:

SELECT
exp(sum(log(X))) as PRODUCTOFX
FROM data_table;


It’s probably really rare for this to be useful in any sort of production environment, however, I hope that this trick could save someone the hassle of having to code up a custom solution to compute a product for data that’s already in a relational database.

Good luck!

## A new year’s post

I don’t think I’ve ever done this on the blog before, but a post by an data-minded colleague of mine has me inspired to put down, for all the internet to see, some of the things that I’ll be working on in the new year. Normally, I view personal growth as an ongoing affair and so I stay away from new year resolutions, but since one of my goals for 2014 is very related to writing I figured this would be a good way to start. Well, here we go with a few things I’d like to accomplish in 2014:

• Blog more: This is something that I’ve gotten a lot of value out of in the past and let slide lately. I’ve always focused on quality over quantity, so I’ll target an average of one post per month for now.
• Put more effort into OpenFisheries.org: This project hasn’t really been as flifilling as I’d hoped at the outset. Putting some more effort into the project sholid reap rewards in the future.
• Take Andrew Ng’s Machine Learning Course: Not much to say here. I’ve heard great things about the course, so I’ve registered and started watching the lectures.
• Invest more in other people: This is not a 180 for me. I feel I’ve always done a reasonably good job of sharing credit with others and being inclusive, however, this year I’ve been exposed to the idea that I can really help others grow in their careers by sharing experiences and offering challenges (delegating?) to those that are keen to accept. Of course, the idea of investing in others will also apply to my partner, but in a different way than career focused individuals.
• Complete a predictive CFL model: This is likely the most difficlit task in this list. CFL games are just so random. I don’t even want to get started on my (lack of) success up to this point trying to predict game outcomes. I might have to settle for a NFL prediction model, but either way, this sholid also be the most fun of the tasks on this list.

This is all for now. Until next time….

Earlier today I learned on twitter that Canada Post, a Federal Crown Corporation in Canada, had filed suit against the owner of geocoder.ca. This company is a small startup that operates a free online geocoding service, presumably as advertisement for services available by the company’s owner Ervin Ruci.

Although I was disappointed at the news from the beginning, it’s not always the case that the little guy is just minding their business when a corporate giant comes along with their legal team. I read both the Statement of Claim prepared on behalf of Canada Post and the Statement of Defence prepared on behalf of Geolytica, the legal name under which geocoder.ca operates. Below I outline some interesting and some comical highlights of the documents.

The basic claim by the plaintiff is simple, geocoder.ca is infringing on an asserted copyright over a database of Canadian postal codes by duplicating this information and selling it on the website. The statement of defence is far more interesting. It offers this gem:

Contrary to to the Plaintiff’s assertion at paragraph 11 of the Statement of Claim that “Her Majesty’s copyright to the CPC Database was transferred to Canada Post” under section 63 of the Canada Post Corporation, no section 63 of the current Canada Post Corporation Act even exists. Neither does the Act that came into force in 1981 transfer such title.

Seems that the CIPPIC, the organization defending Geolytica, have come out swinging. They follow-up this comment with this:

Further, even if copyright subsists in the CPC Database … the Defendant has not infringed any copyright because it has not produced, reproduced or copied the CPC Database, nor otherwise engaged in any act in respect of the CPC Database which only a copyright owner may do. Nor has the Defendant at any time accessed, or had access, to the CPCC Database.

You see, Ervin Ruci claims to have used a crowdsourcing technique to build information about postal codes in Canada. He did this over a period of years with an algorithm that parsed geocoding queries to his website. So, when someone searched for the address “2475 Bayswater Street, Vancouver, BC, V6K 4N3″, he has pretty good information that the address 2475 Bayswater Street, Vancouver, BC, lies within the area covered by V6K 4N3. Over time, he collected enough information to have a pretty good idea of the geographic coordinates associated with Canadian postal codes. Pretty neat idea.

One last tidbit from the Statement of Defence is this:

An address, including the postal code, is a fact and not an original work within the meaning of the Copyright Act. … If a postal code were not a fact, but were rather a copyrighted work, Canadians would regularly infringe Canada Post’s alleged sole right to produce and reproduce “any part” of the CPC Database, as such right is alleged at paragraph 5 of the Plaintiff’s Statement of Claim. Such a result is absurd.

What this is saying is that postal codes are much like the fact that water is the liquid state of H20. When I say that I used to reside at 2475 Bayswater Street, Vancouver, BC, it is like saying that the building I lived in was a mixed residential/commercial low-rise building. Similarly, the postal code V6K4N3, applies to that address and this is simply a fact rather than a copyright infringement.

I wish Ervin Ruci and his defence team CIPPI the best of luck in this case and hope that the judge sets some decent legal precendents so these types of innovation crushing legal moves can be avoided in the future. If you are so inclined, consider donating to Ervin’s legal defence fund here.

After reading a very insightful post a few weeks ago (that I’ve since lost the link to…sorry), I’ve decided to close the comments section on all posts on my blog. Yes, the spam is annoying to deal with, but I’ve had some great experiences over the years meeting people who comment on my posts. I don’t want to miss out on that, so I hope that readers will take the time to send me an e-mail or hit me up on twitter.

I hope that when potential commenters realize that they cannot leave their mark on a blog post of mine they will do one of two things:

1. Send me their comment in an e-mail and we can discuss. I’ll then update the post with changes and/or additions as necessary
2. Write a blog post, tweet, website, whatever of their own

I’d like to see potential commenters take that moment of insight that a post on this blog inspired and create something new of their own. This is especially important for a number of topics that I write about where there aren’t a million other blogs covering the territory. For example, there are only a handful of Stata focused blogs and it will be a long time before the open data scene suffers from over-coverage.

I hope that this little experiment is positive. If you have any comments on this experiment of mine, please feel free to get in contact with me. Or, write a blog post of your own with your thoughts on why blog comments are good, bad, or ugly.