SQL generators are good actually Data science students are often told that SQL is the most important tool to learn. This advice makes some sense given how ubiquitous SQL is in industry, but I think it’s a bit overrated. I’ve been working as a data scientist for eight or years, and now am a product manager on a large data platform product, but I’m still not entirely sure I really “know SQL” in any meaningful way.

Continue reading

I was diagnosed with multiple sclerosis about seven years ago. My first symptoms were numbness in my feet which I thought were caused by tight shoes, but by the time I was vacationing in Belize a few months later, I couldn’t feel much in my hands, or from the chest down. I remember the feeling of swimming as though I were wrapped in a cotton sheet, I remember falling down because I didn’t have any proprioception in my feet, and I remember switching to slip on shoes because I had trouble tying laces.

Continue reading

People have invested about $3 trillion in blockchain technologies globally, but it’s still pretty hard to understand the value that is being produced by this technology. Can we really do anything now that we couldn’t do before? Is the whole industry just a giant financial bubble or is there real value? Most new technologies go through this phase. For example in the late 1990s the internet and e-commerce seemed like a transformative technology, but all of the examples of that technology were kind of silly.

Continue reading

The Economist has published a model which estimates that Kenyans are only detecting 4-25% of the true deaths which can be attributed to Covid. I think this is a good opportunity to learn about why many machine learning models are problematic. I’m going to talk about this particular model, but I should note that I’ve only spent about ten hours looking at this problem and I’m sure the authors of this model are smart thoughtful people who don’t mean to mislead.

Continue reading

People often make a categorical distinction between randomized clinical trial data and other forms of data. Under this view the only information that can ground medical decision making is a large, multicenter, randomized clinical trial, and other study designs can only prove correlation, not causation. People who hold this view treat clinical trials as determinative of causation. Without a clinical trial you can’t make a causal claim, and once you have one, you no longer need to think that hard about causation.

Continue reading

Black and brown people in northern countries have been disproportionately affected by Covid-19. In the US, Sweden, Canada, and the UK, racialized people have been more likely to contract the disease, more likely to have severe courses, and more likely to die from it. The explanation you usually get for this is that excess mortality is caused by systemic racism or social determinants of health. Under this explanation, there’s nothing that surprising about the high Covid mortality because it’s just another example of discriminatory health care policies.

Continue reading

Imagine that someone offered you a free lottery ticket. You would have a small chance of winning a million dollars, but the ticket doesn’t cost anything. It would be silly to turn down this ticket because you thought your odds of winning were either too small or too unclear; the only reason we care about the odds of winning a game is so that we can determine if the expected value of winning is higher than the expected cost of playing.

Continue reading

Author's picture

Gordon Shotwell


Lead Data Scientist at Socure

Canada