Compute the product of a column of data using SQL

In SQL, summing a column of numbers is relatively trivial, and the size of the dataset isn’t important. For example:

SELECT 
  SUM(X) as SUMOFX
FROM data_table;

Mathematically, what we are doing here is:

$$SUMOFX = \sum_{i=1}^n X_i$$

Now, if we find ourselves in the unfortunate situation to compute this:

$$PRODUCTOFX = \prod_{i=1}^n X_i$$

In this case, it won’t be as easy to use the SQL code above, but we can use Euler’s number and a property of the natural logarithm to also make this task a trivial one. The equation that we are going to exploit is:

$$ln(x*y) = ln(x) + ln(y)$$

which gives us a functional relationship between the product and sum of two numbers. This allows us to change our above SQL code to:

SELECT
  exp(sum(log(X))) as PRODUCTOFX
FROM data_table;

It’s probably really rare for this to be useful in any sort of production environment, however, I hope that this trick could save someone the hassle of having to code up a custom solution to compute a product for data that’s already in a relational database.

Good luck!

A new year’s post

I don’t think I’ve ever done this on the blog before, but a post by an data-minded colleague of mine has me inspired to put down, for all the internet to see, some of the things that I’ll be working on in the new year. Normally, I view personal growth as an ongoing affair and so I stay away from new year resolutions, but since one of my goals for 2014 is very related to writing I figured this would be a good way to start. Well, here we go with a few things I’d like to accomplish in 2014:

  • Blog more: This is something that I’ve gotten a lot of value out of in the past and let slide lately. I’ve always focused on quality over quantity, so I’ll target an average of one post per month for now.
  • Put more effort into OpenFisheries.org: This project hasn’t really been as flifilling as I’d hoped at the outset. Putting some more effort into the project sholid reap rewards in the future.
  • Take Andrew Ng’s Machine Learning Course: Not much to say here. I’ve heard great things about the course, so I’ve registered and started watching the lectures.
  • Invest more in other people: This is not a 180 for me. I feel I’ve always done a reasonably good job of sharing credit with others and being inclusive, however, this year I’ve been exposed to the idea that I can really help others grow in their careers by sharing experiences and offering challenges (delegating?) to those that are keen to accept. Of course, the idea of investing in others will also apply to my partner, but in a different way than career focused individuals.
  • Complete a predictive CFL model: This is likely the most difficlit task in this list. CFL games are just so random. I don’t even want to get started on my (lack of) success up to this point trying to predict game outcomes. I might have to settle for a NFL prediction model, but either way, this sholid also be the most fun of the tasks on this list.

This is all for now. Until next time….

Canada post sues Canadian web2.0 & gis pioneer

Earlier today I learned on twitter that Canada Post, a Federal Crown Corporation in Canada, had filed suit against the owner of geocoder.ca. This company is a small startup that operates a free online geocoding service, presumably as advertisement for services available by the company’s owner Ervin Ruci.

Although I was disappointed at the news from the beginning, it’s not always the case that the little guy is just minding their business when a corporate giant comes along with their legal team. I read both the Statement of Claim prepared on behalf of Canada Post and the Statement of Defence prepared on behalf of Geolytica, the legal name under which geocoder.ca operates. Below I outline some interesting and some comical highlights of the documents.

The basic claim by the plaintiff is simple, geocoder.ca is infringing on an asserted copyright over a database of Canadian postal codes by duplicating this information and selling it on the website. The statement of defence is far more interesting. It offers this gem:

Contrary to to the Plaintiff’s assertion at paragraph 11 of the Statement of Claim that “Her Majesty’s copyright to the CPC Database was transferred to Canada Post” under section 63 of the Canada Post Corporation, no section 63 of the current Canada Post Corporation Act even exists. Neither does the Act that came into force in 1981 transfer such title.

Seems that the CIPPIC, the organization defending Geolytica, have come out swinging. They follow-up this comment with this:

Further, even if copyright subsists in the CPC Database … the Defendant has not infringed any copyright because it has not produced, reproduced or copied the CPC Database, nor otherwise engaged in any act in respect of the CPC Database which only a copyright owner may do. Nor has the Defendant at any time accessed, or had access, to the CPCC Database.

You see, Ervin Ruci claims to have used a crowdsourcing technique to build information about postal codes in Canada. He did this over a period of years with an algorithm that parsed geocoding queries to his website. So, when someone searched for the address “2475 Bayswater Street, Vancouver, BC, V6K 4N3″, he has pretty good information that the address 2475 Bayswater Street, Vancouver, BC, lies within the area covered by V6K 4N3. Over time, he collected enough information to have a pretty good idea of the geographic coordinates associated with Canadian postal codes. Pretty neat idea.

One last tidbit from the Statement of Defence is this:

An address, including the postal code, is a fact and not an original work within the meaning of the Copyright Act. … If a postal code were not a fact, but were rather a copyrighted work, Canadians would regularly infringe Canada Post’s alleged sole right to produce and reproduce “any part” of the CPC Database, as such right is alleged at paragraph 5 of the Plaintiff’s Statement of Claim. Such a result is absurd.

What this is saying is that postal codes are much like the fact that water is the liquid state of H20. When I say that I used to reside at 2475 Bayswater Street, Vancouver, BC, it is like saying that the building I lived in was a mixed residential/commercial low-rise building. Similarly, the postal code V6K4N3, applies to that address and this is simply a fact rather than a copyright infringement.

The reason this is absurd, as the statement of defence states, is that if “any part” of Canada Post’s allegedly copyrighted database of postal codes is subject to that copyright, then I have just infringed that copyright by writing an old address of mine in this blog post just the same as thousands of Canadians would be infringing the copyright every day by writing their address and postal code on envelopes, forms, job applications, etc. Likewise, any number of businesses that ask customers to share their address for mailing purposes would also be guilty of infringing this copyright by storing their mailing list rather than purchases the Canada Post database.

I wish Ervin Ruci and his defence team CIPPI the best of luck in this case and hope that the judge sets some decent legal precendents so these types of innovation crushing legal moves can be avoided in the future. If you are so inclined, consider donating to Ervin’s legal defence fund here.

Comments closed

After reading a very insightful post a few weeks ago (that I’ve since lost the link to…sorry), I’ve decided to close the comments section on all posts on my blog. Yes, the spam is annoying to deal with, but I’ve had some great experiences over the years meeting people who comment on my posts. I don’t want to miss out on that, so I hope that readers will take the time to send me an e-mail or hit me up on twitter.

I hope that by forcing blog comments to go through by e-mail inbox I will make a more personal connection with those who contact me. But, by far what I see as the biggest potential benefit of moving comments free is that it may encourage more people to start blogging or engaging with others on Twitter. It’s very easy to leave a comment on a blog post and never come back to follow the discussion. Also, when you do leave a comment on someone’s blog, it becomes more about them than about your own work — maybe one reason blog comment sections are a favoured home of internet trolls.

I hope that when potential commenters realize that they cannot leave their mark on a blog post of mine they will do one of two things:

  1. Send me their comment in an e-mail and we can discuss. I’ll then update the post with changes and/or additions as necessary
  2. Write a blog post, tweet, website, whatever of their own

I’d like to see potential commenters take that moment of insight that a post on this blog inspired and create something new of their own. This is especially important for a number of topics that I write about where there aren’t a million other blogs covering the territory. For example, there are only a handful of Stata focused blogs and it will be a long time before the open data scene suffers from over-coverage.

I hope that this little experiment is positive. If you have any comments on this experiment of mine, please feel free to get in contact with me. Or, write a blog post of your own with your thoughts on why blog comments are good, bad, or ugly.

Open Data Saskatchewan begins

The move towards more open data, both in government and business, seems to be gaining steam around the world. As a long-time user of the World Development Indicators (WDI), I was impressed when the World Bank (WB) decided to open up their data repository to the world and started the Apps for Development challenge to spur innovation using these datasets.

Although the honour of first prize in the Apps for Development challenge went to StatPlanet, which visualizes WDI data on a global map, I found the second and third prize winners interesting as well. In second prize, the development timelines app allows one to compare historical development statistics through time across countries. Third prize in the challenge was claimed by the Yourtopia app, which allows one to mashup WDI data to show which countries around the world are most like your vision of a utopia.

After this contest, I was absolutely floored when it was announced that Kenya would be taking the plunge into open data with a national data portal. I cannot wait to see the applications of this data in the near future.

I’ve also had the pleasure of being involved in the open data movement in British Columbia (BC) after their highly successful data portal was launched. At a hack-a-thon, where developers, enthusiasts and other citizens gather to work with open data, I was involved in creating the bones of an occupation explorer application that seeks to put data into the hands of high school students as they make education decisions about their future careers. Other great examples of applications created in hackathons at these events in BC include:

With all these examples of open data applications and the rapid economic growth in my home province of Saskatchewan in mind, I hoped to see similar and even greater success here. With a great help and advice from other open data enthusiasts around the country I started an organization called OpenDataSK to promote open data use in Saskatchewan. So far the response has been much greater than I dreamed. I may have been in a little hasty in organizing our first hackathon in the province in order to meet International Open Data day on December 3, 2011, however, even this looks to be promising.

I’ve collected links to several data sources in the province that could be used at the hackathon and now I cannot wait to see what happens with data enthusiasts congregate and creativity starts flowing. If you are in the province, or within reasonable travel time, I encourage you to consider attending. The location is yet to be determined, however, information when it becomes available will be both on the OpenDataSK website and the Eventbrite event page.