One of the major highlights of last month was the New York R conference that ran 10-11 September. The lineup of talks featured speakers from diverse backgrounds – ranging from the military, academia, the private sector, to investors in data science ventures. NYR provided me with a wealth of fresh perspectives on data science – so keep reading to find out more!
But first, I want to dwell a little on the richness of the R community, and how events like NYR really support innovation and the feeling of interconnectedness between data enthusiasts. At Vuzo, we have come to deeply appreciate this special strength that knits the R community together. The R language & universe encompasses tens of thousands of data science-related software packages, which are actively evolving thanks to a helpful group of passionate and welcoming data enthusiasts. The NYR conference actually grew out of the New York Open Statistical Programming Meetup (aka the New York R Meetup), the largest in the world with almost 12,000 members. You can browse 11 years’ worth of meetup presentations here.
This year, various benefits of attending the conference revealed themselves as the two days progressed. In particular, I really enjoyed how multifaceted the takeaways were. For instance, some talks focused on new software developments / new package releases, such as: Apache Arrow which now interfaces with the R Tidyverse and also allows fast data transfers using Arrow Flight (see screenshot below); morphemepiece, which is useful in an NLP context for breaking down words into smaller tokens of meaning (e.g., ‘unbreakable’ = ‘un’ + ‘break’ + ‘able’); dbcooper for simplifying the syntax of accessing data within databases; or Robyn – an offering from Facebook which brings together multiple modelling approaches in order to automate marketing data analysis.

Aside from software developments, the NYR conference also covered practical advice on modelling pitfalls: for both NLP (an area Vuzo also uses to deliver insight), with Rachel Tatman’s associated talk being jam-packed with actionable insights into the do’s and dont’s of NLP, as well as covid-related fluctuations in data patterns and how to mitigate their impact, which was featured as part of Max Kuhn’s talk (the well-known author of the caret R package for predictive modelling) – please see screenshots from both talks below.
In tune with the classic adage (that we also embrace here at Vuzo) – ‘Your [data?] scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should.’, NYR of course also put data science into perspective, and included talks from Sonia Ang of Microsoft on the ethical implications of data science work, as well as Andrew Gelman himself, who offered an overview of 30 years’ worth of statistical mistakes and how to avoid them.
Last but certainly not least, I found it refreshing that data collection was also covered at NYR. Typically, conference talks tend to focus on analysis directly, rather than all the thought and effort that will have gone into collecting the necessary data in the first place. In this case, we were treated to a fun talk about all the facets (and data) that go into measuring the K-Pop phenomenon (musical genre from Korea which has taken the world by storm) – from social, to economical, to cultural aspects.
Everyone contributed to a wonderful couple of days of learning, and I am deeply thankful to all the speakers for the time they invested to share their work. However, it would be remiss of me not to mention organisers – Jared Lander and Nicole DelGiudice – for all their work and dedication to bring everyone together. Notably, all this was in addition to Jared giving an excellent talk himself on GPU computing, which is no small feat when carried out in parallel with all the other organizational tasks!
You too can enjoy these talks soon, as organisers will be uploading the recordings here. Please do check for updates at your convenience. Meanwhile, I already cannot wait for next year’s NYR!