RAUL CASTRO FERNANDEZ

Assistant Professor, Computer Science Affiliated Faculty, Data Science Institute The University of Chicago, Chicago, IL, USA

Office 245, John Crerar Library

My goal is to devise ways of gaining value from data. We work on problems in data science and data management that include designing data environments such as data markets, data sharing, dataflow governance, data discovery, integration, and processing -- overall I call this research line 'data ecology'.

I am recruiting postdoctoral researchers: reach out if interested.
I keep an always evolving research statement here (coming soon...).
I sometimes write informally about our research as well as other thoughts on this blog.

RECENT NEWS

OCTOBER, 2024

New Safeinsights Project

Safeinsights project kick-off meeting!

SEPTEMBER, 2024

New Members Join the Group

Joyce Chen and Hrishee Shastri join our group

AUGUST, 2024

VLDB'24

Tapan presents Arachne at VLDB

JUNE, 2024

SIGMOD'24

The group presented some of our newer work at SIGMOD in Santiago de Chile

PUBLICATIONS

2024

Saving Money for Analytical Workloads in the Cloud. Tapan Srivastava, Raul Castro Fernandez VLDB 2024 (New)

Solo: Data Discovery Using Natural Language Questions Via A Self-Supervised Approach. Qiming Wang, Raul Castro Fernandez SIGMOD 2024

Nexus: Correlation Discovery over Collections of Spatio-Temporal Tabular Data. Yue Gong, Sainyam Galhotra, Raul Castro Fernandez SIGMOD 2024

Cackle: Analytical Workload Cost and Peformance Stability with Elastic Pools. Matthew Perron, Raul Castro Fernandez, David DeWitt, Michael Cafarella, Samuel Madden SIGMOD 2024

Responsible Sharing of Spatiotemporal Data Raul Castro Fernandez, Arnab Nandi SIGMOD 2024 (Tutorial)

Demonstration of Ver: View Discovery in the Wild Kevin Dharmawan, Chirag Kawediya, Yue Gong, Zaki Indra Yudhistira, Zhiru Zhu, Sainyam Galhotra, Adila Alfa Krisnadhi, Raul Castro Fernandez SIGMOD 2024 (Demo)

Demonstrating Nexus for Correlation Discovery over Collections of Spatio-Temporal Tabular Data Yue Gong, Raul Castro Fernandez SIGMOD 2024 (Demo)


2023

How Large Language Models Will Disrupt Data Management. Raul Castro Fernandez, Aaron Elmore, Michael Franklin, Sanjay Krishnan, Chenhao Tan. VLDB 2023

Data and AI Model Markets: Grand Opportunities for Data and Model Sharing, Discovery, and Integration. Jian Pei, Raul Castro Fernandez, Xiaohui Yu. VLDB 2023 (Tutorial)
Saibot: A Differentially Private Data Search Platform. Zezhou Huang, Jiaxiang Liu, Daniel Gbenga Alabi, Raul Castro Fernandez, Eugene Wu. VLDB 2023
Addressing Budget Allocation and Revenue Allocation in Data Market Environments Using an Adaptive Sampling Algorithm. Boxin Zhao, Boxiang Lyu, Raul Castro Fernandez, Mladen Kolar. ICML 2023
Data-Sharing Markets: Model, Protocol, and Algorithms to Incentivize the Formation of Data-Sharing Consortia. Raul Castro Fernandez. SIGMOD 2023
Metam: Goal-Oriented Data Discovery. Sainyam Galhotra, Yue Gong, Raul Castro Fernandez. ICDE 2023
Ver: View-Discovery in the Wild. Yue Gong, Zhiru Zhu, Sainyam Galhotra, Raul Castro Fernandez. ICDE 2023

2022

Data Station: Delegated, Trustworthy, and Auditable Computation to Enable Data-Sharing Consortia with a Data Escrow. Siyuan Xia, Zhiru Zhu, Chris Zhu, Jinjin Zhao, Kyle Chard, Aaron Elmore, lan Foster, Michael Franklin, Sanjay Krishnan, Raul Castro Fernandez. VLDB 2022
Revisiting Online Data Markets in 2022. A Seller and Buyer Perspective. Javen Kennedy, Pranav Subramaniam, Sainyam Galhotra, Raul Castro Fernandez. SIGMOD Record
Enabling Al Innovation via Data and Model Sharing: An Overview of the Nsf Convergence Accelerator Track D. Several authors Al Magazine
Protecting Data Markets from Strategic Buyers. Raul Castro Fernandez. SIGMOD 2022
Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation. Alex Zhao, Raul Castro Fernandez. SIGMOD 2022

2020

Data Market Platforms: Trading Data Assets to Solve Data Problems. Raul Castro Fernandez, Pranav Subramaniam, Michael Franklin. VLDB 2020
ARDA: Automatic Relational Data Augmentation for Machine Learning. Nadiia Chepurko, Ryan Marcus, Emanuel Zgraggen, Raul Castro Fernandez, Tim Kraska, David Karger. VLDB 2020
Starling: A Scalable Query Engine on Cloud Function Services. Matt Perron, Raul Castro Fernandez, David DeWitt, Samuel Madden. SIGMOD 2020
A System for Studying Deep Network Training. Raul Castro Fernandez CIDR’20 (Abstract)

2019

Lazo A Cardinality-Based Method for Coupled Estimation of Jaccard Similarity and Containment. Raul Castro Fernandez, Jisoo Min, Demitri Devada, Samuel Madden. ICDE’19
Termite: A System for Tunneling Through Heterogeneous Data. Raul Castro Fernandez, Samuel Madden. AIDM@SIGMOD’19
Raha: A Configuration-Free Error Detection System. Mohammad Mahdavi, Ziawasch Abedjan, Raul Castro Fernandez, Sam Madden, Mourad Ouzzani, Michael Stonebraker, Nan Tang SIGMOD’19
Aurum: A Story About Research Taste. Raul Castro Fernandez. Making Databases Work. ACM Morgan & Claypool. 2019

2018

Aurum: A Data Discovery System. Raul Castro Fernandez, Ziawasch Abedjan, Famien Koko, Gina Yuan, Samuel Madden, Michael Stonebraker. ICDE’18
Seeping Semantics: Linking Datasets using Word Embeddings for Data Discovery. Raul Castro Fernandez, Essam Mansour, Abdulhakim Qahtan, Ahmed Elmagarmid, Ihab Ilyas, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, Nan Tang. ICDE’18
Meta-Dataflows: Efficient Exploratory Dataflow Jobs. Raul Castro Fernandez, William Culhane, William Culhane, Pijika Watcharapichat, Matthias Weidlich, Victoria Lopez Morales, Peter Pietzuch. SIGMOD’18
Extracting Syntactical Patterns from Databases. Andrew Ilyas, Joana M. F. da Trindade, Raul Castro Fernandez, Samuel Madden. ICDE’18
FAHES: A Robust Disguised Missing Values Detector. Mourad Ouzzani, Nan Tang, Ahmed Elmagarmid, Raul Castro Fernandez, Abdulhakim A. Qahtan. KDD’18
Building Data Civilizer Pipelines with an Advanced Workflow Engine. Essam Mansour, Dong Deng, Raul Castro Fernandez, Abdulhakim Qahtan, Wenbo Tao, Ziawasch Abedjan, Ahmed Elmagarmid, Ihab Ilyas, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, Nan Tang. (Demo) ICDE’18

2017

Quill: Efficient, Transferable, and Rich Analytics at Scale. Badrish Chandramouli, Raul Castro Fernandez, Jonathan Goldstein, Ahmed Eldawy, Abdul Quamar. VLDB’17
The Data Civilizer System. Dong Deng, Raul Castro Fernandez, Ziawasch Abedjan, Sibo Wang, Michael Stonebraker, Ahmed Elmagarmid, Ihab Ilyas, Samuel Madden, Mourad Ouzzani, Nan Tang. CIDR’17
A Demo of the Data Civilizer System. Raul Castro Fernandez, Dong Deng, Essam Mansour, Abdulhakim A Qahtan, Wenbo Tao, Ziawasch Abedjan, Ahmed Elmagarmid, Ihab Ilyas, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, Nan Tang. (Demo) SIGMOD’17

2016

Ako: Decentralised Deep Learning with Partial Gradient Exchange. Pijika Watcharapichat, Victoria Lopez Morales, Raul Castro Fernandez, Peter Pietzuch. SOCC’16
Detecting Data Errors: Where are we and what needs to be done?. Ziawasch Abedjan, Xu Chu, Dong Deng, Raul Castro Fernandez, Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti, Michael Stonebraker, Nan Tang. VLDB’16
Towards Large-Scale Data Discovery. Raul Castro Fernandez, Ziawasch Abedjan, Samuel Madden, Michael Stonebraker. ExploreDB@SIGMOD’16
SABER: Window-Based Hybrid Stream Processing for Heterogeneous Architectures. Alexandros Koliousis, Matthias Weidlich, Raul Castro Fernandez, Paolo Costa, Alexander Wolf, Peter Pietzuch. SIGMOD’16
Java2SDG: Stateful Big Data Processing for the Masses. Raul Castro Fernandez, Panagiotis Garefalakis, Peter Pietzuch. (Demo) ICDE’16

2015

Liquid: Unifying Nearline and Offline Big Data Integration. Raul Castro Fernandez, Peter Pietzuch, Joel Koshy, Jay Kreps, Dong Lin, Neha Narkhede, Jun Rao, Chris Riccomini, Guozhang Wang. CIDR’15

2014

Making State Explicit for Imperative Big Data Processing. Raul Castro Fernandez, Matteo Migliavacca, Evangelia Kalyvianaki and Peter Pietzuch. USENIX ATC’14
Grand Challenge Scalable Stateful Stream Processing for Smart Grids. Raul Castro Fernandez, Matthias Weidlich, Peter Pietzuch and Avigdor Gal. DEBS’14

2013

Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management. Raul Castro Fernandez, Matteo Migliavacca, Evangelia Kalyvianaki and Peter Pietzuch. SIGMOD’13 (SIGMOD’23 Test of Time Award)
Towards Low-Latency and In-Memory Large-Scale Data Processing. Raul Castro Fernandez and Peter Pietzuch. PhD Workshop@DEBS’13

STUDENTS

Below I include Postdocs, PhD, and Master students. In addition to these, I’m fortunate to work with great undergraduate students and occasionally with external students.

Postdocs and PhD Students

  • Qiming Wang
  • Yue Gong
  • Zhiru Zhu
  • Tapan Srivastava
  • Steven Xia
  • Chris Zhu

Master and Undergraduate Students

  • Alena Zeng
  • Chirag Kawediya

Alumni

  • Kevin Dharmawan (external collaborator, to SBU PhD program)
  • Zach Hempstead (to Anthropic)
  • Sainyam Galhotra (to Cornell (assistant professor))
  • Stanley Zhu (to Google)
  • Alex Zhao (to Citadel)
  • Jenny Long
  • Yintong Ma (to ByteDance)
  • Ipsita Mohanty (to UWaterloo MSC program)
  • Ryan Wong (to UMichigan Undegraduate program)

TEACHING

  • The Value of Data (Fall’20, Fall’21, Fall'22, Fall'23, Spring'24)
  • Ethics, Fairness, Responsibility, and Privacy in Data Science (Spring’20, Spring’21, Spring'22, Spring'23, Spring'24)
  • Introduction to Databases (Winter’20, Winter’21, Winter'22, Winter'23)

SERVICE

  • SIGMOD’24 PC Member
  • CIDR’24 PC Member
  • SIGMOD’23 PC Member
  • SIGMOD’23 Mentorship Co-Chair
  • VLDB’23 PC Member and publicity chair
  • HPTS’22 PC Member
  • SIGMOD’22 PC Member and publicity chair
  • SIGMOD’22 publicity chair
  • VLDB’22 PC Member
  • VLDB’22 Workshop Co-Chair
  • KDD’21 PC Member
  • SIGMOD’21 PC Member (Demo track)
  • VLDB’21 PC Member (Distinguished Reviewer Award)
  • ICDE’21 PC Member
  • VLDB’20 PC Member
  • SoCC’20 PC Member
  • SIGMOD’19 PC Member (Distinguished Reviewer Award)
  • VLDBJ Reviewer
  • TKDE Reviewer
  • TODS Reviewer
  • SIGMOD Record

BIO

I completed a postdoc at MIT working with Sam Madden. Before that, I obtained my PhD at Imperial College London working with Peter Pietzuch.