RECENT NEWS
PUBLICATIONS
2024
Saving Money for Analytical Workloads in the
Cloud. Tapan Srivastava, Raul Castro Fernandez VLDB 2024 (New)
Solo: Data Discovery Using Natural Language
Questions Via A Self-Supervised Approach. Qiming Wang, Raul Castro Fernandez SIGMOD 2024
Nexus: Correlation Discovery over
Collections of Spatio-Temporal Tabular Data. Yue Gong, Sainyam
Galhotra, Raul Castro Fernandez SIGMOD 2024
Cackle: Analytical Workload Cost and
Peformance Stability with Elastic Pools. Matthew Perron, Raul
Castro Fernandez, David DeWitt, Michael Cafarella, Samuel Madden SIGMOD 2024
Responsible Sharing of Spatiotemporal Data Raul
Castro Fernandez, Arnab Nandi SIGMOD 2024 (Tutorial)
Demonstration of Ver: View Discovery in
the Wild Kevin Dharmawan, Chirag Kawediya, Yue Gong, Zaki Indra
Yudhistira, Zhiru Zhu, Sainyam Galhotra, Adila Alfa Krisnadhi, Raul Castro
Fernandez SIGMOD 2024 (Demo)
Demonstrating Nexus for Correlation
Discovery over Collections of Spatio-Temporal Tabular Data Yue
Gong, Raul Castro Fernandez SIGMOD 2024 (Demo)
2023
How Large Language Models Will Disrupt Data
Management. Raul Castro Fernandez, Aaron Elmore, Michael Franklin,
Sanjay Krishnan, Chenhao Tan. VLDB 2023
Data and AI Model Markets: Grand Opportunities for
Data and Model Sharing, Discovery, and Integration. Jian Pei, Raul
Castro Fernandez, Xiaohui Yu. VLDB 2023 (Tutorial)
Saibot: A Differentially Private Data
Search Platform. Zezhou Huang, Jiaxiang Liu, Daniel Gbenga Alabi,
Raul Castro Fernandez, Eugene Wu. VLDB 2023
Addressing Budget Allocation and Revenue
Allocation in Data Market Environments Using an Adaptive Sampling
Algorithm. Boxin Zhao, Boxiang Lyu, Raul Castro Fernandez, Mladen
Kolar. ICML 2023
Data-Sharing Markets: Model, Protocol,
and Algorithms to Incentivize the Formation of Data-Sharing
Consortia. Raul Castro Fernandez. SIGMOD 2023
Metam: Goal-Oriented Data
Discovery. Sainyam Galhotra, Yue Gong, Raul Castro Fernandez.
ICDE 2023
Ver: View-Discovery in the
Wild. Yue Gong, Zhiru Zhu, Sainyam Galhotra, Raul Castro Fernandez.
ICDE 2023
2022
Data Station: Delegated, Trustworthy, and
Auditable Computation to Enable Data-Sharing Consortia with a Data
Escrow. Siyuan Xia, Zhiru Zhu, Chris Zhu, Jinjin Zhao, Kyle Chard,
Aaron Elmore, lan Foster, Michael Franklin, Sanjay Krishnan, Raul Castro
Fernandez. VLDB 2022
Revisiting Online Data Markets in 2022. A
Seller and Buyer Perspective. Javen Kennedy, Pranav Subramaniam,
Sainyam Galhotra, Raul Castro Fernandez. SIGMOD Record
Enabling Al Innovation via Data and Model
Sharing: An Overview of the Nsf Convergence Accelerator Track D.
Several authors Al Magazine
Protecting Data Markets from Strategic
Buyers. Raul Castro Fernandez. SIGMOD 2022
Leva: Boosting Machine Learning
Performance with Relational Embedding Data Augmentation. Alex Zhao,
Raul Castro Fernandez. SIGMOD 2022
2020
Data Market Platforms: Trading Data
Assets to Solve Data Problems. Raul Castro Fernandez, Pranav
Subramaniam, Michael Franklin. VLDB 2020
ARDA: Automatic Relational Data
Augmentation for Machine Learning. Nadiia Chepurko, Ryan Marcus,
Emanuel Zgraggen, Raul Castro Fernandez, Tim Kraska, David Karger.
VLDB 2020
Starling: A Scalable Query Engine on
Cloud Function Services. Matt Perron, Raul Castro Fernandez, David
DeWitt, Samuel Madden. SIGMOD 2020
A System for Studying Deep Network
Training. Raul Castro Fernandez CIDR’20 (Abstract)
2019
Lazo A Cardinality-Based Method for
Coupled Estimation of Jaccard Similarity and Containment. Raul
Castro Fernandez, Jisoo Min, Demitri Devada, Samuel Madden.
ICDE’19
Termite: A System for Tunneling Through
Heterogeneous Data. Raul Castro Fernandez, Samuel Madden.
AIDM@SIGMOD’19
Raha: A Configuration-Free Error
Detection System. Mohammad Mahdavi, Ziawasch Abedjan, Raul Castro
Fernandez, Sam Madden, Mourad Ouzzani, Michael Stonebraker, Nan Tang
SIGMOD’19
Aurum: A Story About Research
Taste. Raul Castro Fernandez. Making Databases Work. ACM
Morgan & Claypool. 2019
2018
Aurum: A Data Discovery
System. Raul Castro Fernandez, Ziawasch Abedjan, Famien Koko, Gina
Yuan, Samuel Madden, Michael Stonebraker. ICDE’18
Seeping Semantics: Linking Datasets using
Word Embeddings for Data Discovery. Raul Castro Fernandez, Essam
Mansour, Abdulhakim Qahtan, Ahmed Elmagarmid, Ihab Ilyas, Samuel Madden, Mourad
Ouzzani, Michael Stonebraker, Nan Tang. ICDE’18
Meta-Dataflows: Efficient Exploratory
Dataflow Jobs. Raul Castro Fernandez, William Culhane, William
Culhane, Pijika Watcharapichat, Matthias Weidlich, Victoria Lopez Morales, Peter
Pietzuch. SIGMOD’18
Extracting Syntactical Patterns from
Databases. Andrew Ilyas, Joana M. F. da Trindade, Raul Castro
Fernandez, Samuel Madden. ICDE’18
FAHES: A Robust Disguised Missing Values
Detector. Mourad Ouzzani, Nan Tang, Ahmed Elmagarmid, Raul Castro
Fernandez, Abdulhakim A. Qahtan. KDD’18
Building Data Civilizer Pipelines with an
Advanced Workflow Engine. Essam Mansour, Dong Deng, Raul Castro
Fernandez, Abdulhakim Qahtan, Wenbo Tao, Ziawasch Abedjan, Ahmed Elmagarmid,
Ihab Ilyas, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, Nan Tang.
(Demo) ICDE’18
2017
Quill: Efficient, Transferable, and Rich
Analytics at Scale. Badrish Chandramouli, Raul Castro Fernandez,
Jonathan Goldstein, Ahmed Eldawy, Abdul Quamar. VLDB’17
The Data Civilizer System.
Dong Deng, Raul Castro Fernandez, Ziawasch Abedjan, Sibo Wang, Michael
Stonebraker, Ahmed Elmagarmid, Ihab Ilyas, Samuel Madden, Mourad Ouzzani, Nan
Tang. CIDR’17
A Demo of the Data Civilizer
System. Raul Castro Fernandez, Dong Deng, Essam Mansour, Abdulhakim
A Qahtan, Wenbo Tao, Ziawasch Abedjan, Ahmed Elmagarmid, Ihab Ilyas, Samuel
Madden, Mourad Ouzzani, Michael Stonebraker, Nan Tang. (Demo)
SIGMOD’17
2016
Ako: Decentralised Deep Learning with
Partial Gradient Exchange. Pijika Watcharapichat, Victoria Lopez
Morales, Raul Castro Fernandez, Peter Pietzuch. SOCC’16
Detecting Data Errors: Where are we and
what needs to be done?. Ziawasch Abedjan, Xu Chu, Dong Deng, Raul
Castro Fernandez, Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti, Michael
Stonebraker, Nan Tang. VLDB’16
Towards Large-Scale Data
Discovery. Raul Castro Fernandez, Ziawasch Abedjan, Samuel Madden,
Michael Stonebraker. ExploreDB@SIGMOD’16
SABER: Window-Based Hybrid Stream
Processing for Heterogeneous Architectures. Alexandros Koliousis,
Matthias Weidlich, Raul Castro Fernandez, Paolo Costa, Alexander Wolf, Peter
Pietzuch. SIGMOD’16
Java2SDG: Stateful Big Data Processing
for the Masses. Raul Castro Fernandez, Panagiotis Garefalakis,
Peter Pietzuch. (Demo) ICDE’16
2015
Liquid: Unifying Nearline and Offline Big Data Integration. Raul Castro Fernandez, Peter Pietzuch, Joel Koshy, Jay Kreps, Dong Lin, Neha Narkhede, Jun Rao, Chris Riccomini, Guozhang Wang. CIDR’15
2014
Making State Explicit for Imperative Big
Data Processing. Raul Castro Fernandez, Matteo Migliavacca,
Evangelia Kalyvianaki and Peter Pietzuch. USENIX ATC’14
Grand Challenge Scalable Stateful Stream
Processing for Smart Grids. Raul Castro Fernandez, Matthias
Weidlich, Peter Pietzuch and Avigdor Gal. DEBS’14
2013
Integrating Scale Out and Fault Tolerance in Stream
Processing using Operator State Management. Raul Castro Fernandez,
Matteo Migliavacca, Evangelia Kalyvianaki and Peter Pietzuch. SIGMOD’13 (SIGMOD’23 Test of Time
Award)
Towards Low-Latency and In-Memory
Large-Scale Data Processing. Raul Castro Fernandez and Peter
Pietzuch. PhD Workshop@DEBS’13
STUDENTS
Below I include Postdocs, PhD, and Master students. In addition to these, I’m fortunate to work with great undergraduate students and occasionally with external students.
Postdocs and PhD Students
- Qiming Wang
- Yue Gong
- Zhiru Zhu
- Tapan Srivastava
- Steven Xia
- Chris Zhu
Master and Undergraduate Students
- Alena Zeng
- Chirag Kawediya
Alumni
- Kevin Dharmawan (external collaborator, to SBU PhD program)
- Zach Hempstead (to Anthropic)
- Sainyam Galhotra (to Cornell (assistant professor))
- Stanley Zhu (to Google)
- Alex Zhao (to Citadel)
- Jenny Long
- Yintong Ma (to ByteDance)
- Ipsita Mohanty (to UWaterloo MSC program)
- Ryan Wong (to UMichigan Undegraduate program)
TEACHING
- The Value of Data (Fall’20, Fall’21, Fall'22, Fall'23, Spring'24)
- Ethics, Fairness, Responsibility, and Privacy in Data Science (Spring’20, Spring’21, Spring'22, Spring'23, Spring'24)
- Introduction to Databases (Winter’20, Winter’21, Winter'22, Winter'23)
SERVICE
- SIGMOD’24 PC Member
- CIDR’24 PC Member
- SIGMOD’23 PC Member
- SIGMOD’23 Mentorship Co-Chair
- VLDB’23 PC Member and publicity chair
- HPTS’22 PC Member
- SIGMOD’22 PC Member and publicity chair
- SIGMOD’22 publicity chair
- VLDB’22 PC Member
- VLDB’22 Workshop Co-Chair
- KDD’21 PC Member
- SIGMOD’21 PC Member (Demo track)
- VLDB’21 PC Member (Distinguished Reviewer Award)
- ICDE’21 PC Member
- VLDB’20 PC Member
- SoCC’20 PC Member
- SIGMOD’19 PC Member (Distinguished Reviewer Award)
- VLDBJ Reviewer
- TKDE Reviewer
- TODS Reviewer
- SIGMOD Record
BIO
I completed a postdoc at MIT working with Sam Madden. Before that, I obtained my PhD at Imperial College London working with Peter Pietzuch.