Large language models will never be conscious
Data Science
philosophy
Many engineers who work in deep learning are worried about machines gaining consciousness. Over the last ten years there’s been an explosion in tools which use deep learning to solve problems which were previously only the domain of human beings, and this has led many people to believe that it’s just a matter of time before these systems gain consciousness. The worry is that if we’re not super careful, consciousness will just accidentally emerge from these systems and we will enter a kind of feedback loop where machine intelligence will take over our society. This is a genuine worry for people; it’s caused people to lose their jobs, and almost half of AI researchers believe there’s a real chance that AI will cause humans to become extinct. I think that this worry arises from a bad theory of consciousness, and that consciousness is not possible in our current AI systems.
Testing dbplyr packages
R
Data Science
Data science students are often told that SQL is the most important tool to learn. This advice makes some sense given how ubiquitous SQL is in industry, but I think it’s a bit overrated. I’ve been working as a data scientist for eight or years, and now am a product manager on a large data platform product, but I’m still not entirely sure I really “know SQL” in any meaningful way. I can sort of read and write simple SQL statements, but I almost always use tools like
dbplyr
instead of writing SQL directly. Initially I thought this was a big gap in my knowledge base, and I felt that relying on SQL generators was a kind of crutch, but over time I’ve become convinced that this is a better way to work.
What MS taught me about the pandemic
Covid
Personal
I was diagnosed with multiple sclerosis about seven years ago. My first symptoms were numbness in my feet which I thought were caused by tight shoes, but by the time I was vacationing in Belize a few months later, I couldn’t feel much in my hands, or from the chest down. I remember the feeling of swimming as though I were wrapped in a cotton sheet, I remember falling down because I didn’t have any proprioception in my feet, and I remember switching to slip on shoes because I had trouble tying laces.
What’s the blockchain good for anyway?
People have invested about $3 trillion in blockchain technologies globally, but it’s still pretty hard to understand the value that is being produced by this technology. Can we really do anything now that we couldn’t do before? Is the whole industry just a giant financial bubble or is there real value?
Why the Economist’s excess death model is misleading
Data Science
Covid
The Economist has published a model which estimates that Kenyans are only detecting 4-25% of the true deaths which can be attributed to Covid. I think this is a good opportunity to learn about why many machine learning models are problematic. I’m going to talk about this particular model, but I should note that I’ve only spent about ten hours looking at this problem and I’m sure the authors of this model are smart thoughtful people who don’t mean to mislead. That said, I think it’s an excellent example of how machine learning models can lend a sheen of credibility to things that are basically unsupported assertions. When someone says that their model says something, most people assume that means that it’s supporting that thing with hard data when it’s often just making unsupported assertions. It’s possible that the authors of this model have sound reasons about why they can make global excess death predictions based on a small unrepresentative sample of countries, but even so I think these observations are helpful for figuring out which models you should trust.
How to make good decisions from bad data
Data Science
Covid
Vitamin D
People often make a categorical distinction between randomized clinical trial data and other forms of data. Under this view the only information that can ground medical decision making is a large, multicenter, randomized clinical trial, and other study designs can only prove correlation, not causation. People who hold this view treat clinical trials as determinative of causation. Without a clinical trial you can’t make a causal claim, and once you have one, you no longer need to think that hard about causation.
Taking vitamin D back from the racists
Covid
Vitamin D
Black and brown people in northern countries have been disproportionately affected by Covid-19. In the US, Sweden, Canada, and the UK, racialized people have been more likely to contract the disease, more likely to have severe courses, and more likely to die from it. The explanation you usually get for this is that excess mortality is caused by systemic racism or social determinants of health. Under this explanation, there’s nothing that surprising about the high Covid mortality because it’s just another example of discriminatory health care policies. This explanation is too vague.
Masks and Vitamin D
Vitamin D
Covid
Imagine that someone offered you a free lottery ticket. You would have a small chance of winning a million dollars, but the ticket doesn’t cost anything. It would be silly to turn down this ticket because you thought your odds of winning were either too small or too unclear; the only reason we care about the odds of winning a game is so that we can determine if the expected value of winning is higher than the expected cost of playing. If the ticket is free, then so long as there is any chance you might win, the rational decision is to play.
Vitamin D and Covid-19
Vitamin D
In a recent piece about the puzzling ways that Covid-19 has spread across the world the New York Times explores a number of possible theories about why Covid-19 has affected some countries more grievously than others, including “demographics, culture, environment, and the speed of government responses.” I think Vitamin D status should probably be included in this conversation.
Why I Use R
R
Data Science
Over the last couple of years prominent members of both the R and Python communities have tried to move past the language wars and support both R and Python workflows. This makes sense intellectually; after all, R and Python are not all that different in the scheme of things, and so we should let people use whichever language they find more productive. This conversation manifests very differently in the workplace, however.
Most of the time when a Python data scientist hears that the language wars are over, they think “Well, great — if R and Python are equally effective, then we can all just standardize on Python.”
Most of the time when a Python data scientist hears that the language wars are over, they think “Well, great — if R and Python are equally effective, then we can all just standardize on Python.”
Technical debt for data scientists
Data Science
Technical debt is the process of avoiding work today by promising to do work tomorrow. A team might identify that there’s a small time window for a particular change to be implemented and the only way they can hit that window is to take shortcuts in the development process. They might soberly calculate that the benefits of getting something done now are worth the costs of fixing it later. This kind of technical debt is similar to taking out a mortgage or small business loan. You don’t have the money to realize an opportunity right now, so you borrow that money even though it’s going to cost more down the road. The lifetime cost of the investment goes up, but at least you get to make the investment.
“Testing machine learning models with testthat”
Data Science
Automated testing is a huge part of software development. Once a project reaches a certain level of complexity, the only way that it can be maintained is if it has a set of tests that identify the main functionality and allow you to verify that functionality is intact. Without tests, it’s difficult or impossible to identify where errors are occurring, and to fix those errors without causing further problems.
“Advice for non-traditional data scientists”
Data Science
Education
I have a pretty strange background for a data scientist. In my career I’ve sold electric razors, worked on credit derivatives during the 2008 financial crash, written market reports on orthopaedic biomaterials, and practiced law. I started programming in R during law school, partly as a way to learn more about data visualization and partly to help analyze youth criminal justice data. Over time I came to enjoy programming more than law and decided to make the switch to data work about three years ago. Since then I’ve freelanced a bit, worked as a Data Scientist at Upworthy, and now am a Senior R Developer at a survey company called Crunch.io.
“R for Excel Users”
Data Science
Education
Like most people, I first learned to work with numbers through an Excel spreadsheet. After graduating with an undergraduate philosophy degree, I somehow convinced a medical device marketing firm to give me a job writing Excel reports on the orthopedic biomaterials market. When I first started, I remember not knowing how to anything, but after a few months I became fairly proficient with the tool, and was able to build all sorts of useful models. When you think about it, this is an amazing feature of Excel. Every day, all over the world, people open up a spreadsheet to do some data entry and then, bit by bit, learn to do increasingly complex analytical tasks. Excel is a master at teaching people how to use Excel.
No matching items