I spend my day working as a data scientist for Hugging Face. Before that, I have worked at DataRobot, Snorkel AI, Caterpillar and State Farm. I have experience in a wide ranging set of areas including supply chain, machines, acturial ratings, telematics, geospatial, and security projects.
I conduct research, teach, and engage with other academic researchers on public policy issues with software/algorithms and surveillance issues (especially in Chicago).
I have a passion for understanding the dynamics between technology and people. I was educated as an electrical engineer, went to law school, but found in communications a home for my research.
I left academia a few years ago and now work as a data scientist for Hugging Face. To keep up on my recent AI talks, please check out my page on talks.
View my Works
To keep up on my recent AI talks, please check out my page on talks .
I have been focusing my efforts working with customers, but I still have public contributions in terms of blog posts, research, videos, and code.
My notebooks on building an image classifier using Keras in either R or Python.
Presented at UseR 2016 at Standford on my Outlier App.
Presented at KDD 2016 in San Francisco on Applying Deep Learning to Basketball Trajectories.
Created a Youtube video channel, Rajistics, with videos on deep learning.
Book chapter on the history of surveillance in Chicago.
Empirical study on cameras and crime in Chicago
I have been engaged in studying surveillance in Chicago for over the last ten years.
I have been interviewed many times as an expert on Chicago, specific interviews are listed under media appearances link at the bottom of the page. My published work includes a book chapter documenting the recent history of surveillance in Chicago and a paper studying the effectiveness of the blue light surveillance cameras in Chicago.
Shah, R. C., & McQuade, Brendan (forthcoming 2016). Surveillance, Security, and Intelligence-Led Policing in Chicago. In (Bennett, Larry; Garner, Roberta and Hague, Euan,,eds), Neoliberal Chicago: University of Illinois Press.
Shah, R. C., & Braithwaite, J. (2012). Spread Too Thin: Analyzing the Effectiveness of the Chicago Camera Network on Crime. Police Practice and Research: An International Journal.
In this paper, researchers evaluated two studies that analyzed the effectiveness of Chicago’s camera network in reducing crime. Chicago has one of the largest urban surveillance networks, with over a 1000 cameras. The analysis found the initial crime level of an area where a camera was placed had a significant effect. In areas with high crime, cameras were very effective in reducing crime. In other areas, the cameras had little effect in reducing crime. This exploratory research suggests fewer cameras in crime hotspots are much more effective than a wide diffuse camera dragnet.
Analyzing their effectiveness in Chicago
I have been studying red light cameras in Chicago since 2009. My study in 2010 was the first published study that drew doubt on the city's claim that red light cameras carried a significant safety benefit. I follow the latest events on red light cameras at my blog: EyeingChicago.com
Shah, R.C. (2010). Effectiveness of Red Light Cameras in Chicago: An Exploratory Analysis. Published at EyeingChicago.com
The results here mirror my earlier study. Despite the million of dollars invested in RLCs and half a billion dollars in tickets, there is no evidence that the RLCs have had a significant safety benefit.
Exploring NBA motion data
This work uses a rich set of NBA motion data (over a billion rows) for over 600 games in the 2015-2016 season. This is a wonderful dataset to analyze from a telematics or Internet of Things (IoT) perspective. Additionally, there is a wealth of theories and analytics around basketball.
I created a set of notebooks that cover EDA, merging play by play data, measuring player spacing using convex hulls, calculating velocity/acceleration, and analyzing player/ball trajectories.
The notebooks are based on R code (rstats)./p>
A journey exploring Spark
This work highlights my knowledge and use of spark for data science work.
My github repo includes a set of notebooks with basic use of spark using scala, such as a recommender, predictive model, and outlier detection using H2O.
In addition to the notebooks, I gave a talk to the Chicago spark user group on the issues around using spark for data science.
Chicago Divvy: A Day in the Life
This project visualizes 27 Divvy bikes on July 1st, 2014 in Chicago. This small sample of Divvy bike data allowed for an impressive animation of the movement of divvy bikes through the city.
The data was found at the Divvy Bikes web site.
An app to explore the Chicago food inspection prediction model
Developed an interactive application to explore the Chicago food prediction algorithm. The app allows a users to try different models and variable combinations and guage their effect.
The app was created using Rstudio's Shiny app and incorporate algorithms such as glmnet, random forests, and logistic regression.
An app illustrating approaches to outlier detection
The app allows you to see the trade-offs on various types of outlier / anomaly detection algorithms. Outliers are marked with a star and cluster centers with an X.
The app is a shiny app that uses a number of R packages including algorithms for kmeans, fuzzy kmeans, hierarchical clustering, dbscan, isolation forest, and an autoencoder.
An spatial analysis showing dangerous roads.
A visualization of roads in Chicago that highlight their danger level. The visualization uses publicly available crash data and traffic volume data.
This visualization was done as part of a proposed navigational app to warn people of dangerous intersections. This work was done with Aaron Moore in ArcGis and CartoDB.
Using Bokeh to visualize Word2Vec clustering.
This page presents a text clustering example using 40,000 cases from the Seventh Circuit Court of Appeals. The visualization puts similar words closer together and the colors represent distinct clusters of words. I used Word2Vec with Kmeans for the clustering analysis. The results are then presented using Bokeh.
This work was done in python with NLTK, Word2Vec, and Bokeh. I did this to show my skills associated with unstructured data, neural networks, natural language processing, and visualization tools.
The benefits of open standard, studying ODF, and lessons from Massachusetts
This work stems from my academic work studying open standards. I have listed the significant publications and am happy to talk about the research.
For background, open standards are publicly available specifications that offer a wealth of economic and technological benefits. Governments around the world are considering mandating open standards, especially in the area of document formats.
Shah, R.C., Kesan, J. P., & Kennis A. (2008). Lessons for Government Adoption of Open Standards: A Case Study of the Massachusetts Policy. Journal of Information Technology & Politics 5(4), 387-398.
Shah, R. C., & Kesan, J. P. (2009). Running Code as Part of an Open Standards Policy. First Monday 6(1).
Shah, R.C., & Kesan, J.P. (2012). Lost in Translation: Interoperability Issues for Open Standards. I/S: A Journal of Law and Policy for the Information Society 8(1), 113-141.
Shah, R. C., & Kesan, J. P. (draft). An Empirical Study of Open Standards. (A revised version won Best Paper Award for E-Government Track at HICSS 41)
A R/Shiny app for interactive RNN tensorflow models
This project created a RStudio Shiny app for the deep learning tensorflow application. The app allows trying different inputs, RNN cell types, and even optimizers. The results are shown with plots as well as a link to tensorboard. This app allows anyone to try and play around with deep learning through a GUI interface.
The influence of defaults and nudges
This work stems from my academic work studying defaults. I have listed the significant publications and am happy to talk about the research.
Defaults are pre-selected options chosen by a developer. Users tend to defer to these pre-selected options. Policymakers can take advantage of this deference in setting defaults.
Shah, R. C., & Kesan, J. P. (2008). Setting Online Policy With Software Defaults. Information, Communication, and Society 11(7), 989-1007.
Shah, R.C., & Sandvig, C. (2008). Defaults as De Facto Regulation: The Case of Wireless Access Points. Information, Communication and Society, 11(1), 25-46.
Kesan, J. P., & Shah, R.C. (2006). Setting Software Defaults: Perspectives from Law, Computer Science and Behavioral Economics. Notre Dame Law Review, 82(2), 583-634.
The influence of architecture generally
This work stems from my early academic work studying how architecture affects behavior. I have listed the significant publications and am happy to talk about the research.
The first paper looks at physical architecture (the built environment), while the second paper focuses on ways softare can be designed to influence behavior.
Shah, R. C., & Kesan, J. P. (2007). How Architecture Regulates. Journal of Architectural and Planning Research, 24(4), 350-359.
Shah, R. C., & Kesan, J. P. (2003). Manipulating the Governance Characteristics of Code. Info, 5(4), 3-9.
Institutional perspective on the development of software
This work stems from my academic work studying the development of software with an emphasis on the role of several institutions including universities, firms, consortia, and the open source movement. I have listed the significant publications and am happy to talk about the research.
For each institution, the analysis examines their internal processes and norms that affect the development process. The analysis also examines how each institution emphasizes different social and technical attributes that are embedded in code.
Shah, R. C., & Kesan, J. P. (2009). Recipes for Cookies: How Institutions Shape Communication Technologies. New Media & Society, 11(3), 315-336.
Shah, R. C., & Kesan, J. P. (2005). Nurturing Software: How Societal Institutions Shape the Development of Software. Communications of the ACM, 40(9), 80-85.
Kesan, J. P., & Shah, R. C. (2004). Deconstructing Code. Yale Journal of Law & Technology, 6, 277-389.
Shah, R. C., & Kesan, J. P. (2003). Incorporating Societal Concerns into Communication Technologies. IEEE Technology and Society Magazine, 22(2), 28-33.
Strategies government can use to influence software
This detailed paper shows the many ways government can influence the development of software/code. The methods include using the government’s regulatory power, fiscal power, and the ability to influence intellectual property rights.
Kesan, J. P., & Shah, R. C. (2005). Shaping Code. Harvard Journal of Law & Technology, 18(2), 319-399.
Tracing the privatization of the Internet
The Internet's origins date back to the 1960s with government funded research into computer networks. This work traces the history and implications of shifting control over the Internet to the private sector, a process called privatization.
Shah, R. C., & Kesan, J. P. (2007). The Privatization of the Internet's Backbone Network. Journal of Broadcasting and Electronic Media, 51(1), 93-109.
Kesan, J. P., & Shah, R. C. (2001). Fool Us Once Shame on You - Fool Us Twice Shame on Us: What We Can Learn from the Privatizations of the Internet Backbone Network and the Domain Name System. Washington University Law Quarterly, 79(1), 89-220.
A short analysis and visualization on Chicago's efforts to fix 250,000 potholes in the last few years.
This page provides a some insights into the 250,000 potholes from 311 requests filled in the last few years by Chicago. The project allowed me to use a variety of tools including htmlwidgets and torque on cartodb for a dynamic time series mapping visualization.
The project begins with a chart showing how many days it takes the city to fill a pothole. The vast majority are filled in less than a week after they have been reported.
Under fixed potholes, is a chart showing how many potholes the city has fixed on every day for the last few years. There is a clear seasonality to when potholes are fixed.
Finally, under movie, there is an animation on a map showing when and where potholes are reported and fixed.
Apache, Cookies, Finger, NCSA Mosaic, and PICS
A set of technological case studies used in my research. Some of the criteria in choosing these cases included representing a variety of institutional origins (.e.g, universities, open source . . .) and affecting significant policy issues.
Apache: The development of Apache by the open source movement. Apache is the most widely used web server.
Cookies: Netscape's incorporation of the cookies technology into their web browser. Cookies are a technology that allows web sites to gather information about their visitors.
Finger: The development of the finger command, which reveals information about people on a computer network.
NCSA Mosaic: The development of the first popular web browser, NCSA (National Center for Supercomputing Applications) Mosaic, within a university.
PICS: The development of the Platform for Internet Content Selection (PICS) by the World Wide Web Consortium. PICS is a standard for labeling web pages for the purpose of limiting access to inappropriate material by minors.
Feel free to reach out to me using Twitter, Email, or Linkedin. I also have separate pages for my academic bio, publications, CV, and media mentions. The links for these are below.