PROJECTS OVERVIEW 

The South Big Data Innovation Hub supports large and small scale projects, from $1,000 - $1 million, aimed to increase the efficiency and effectiveness of knowledge and technology transfer between individuals, universities, public and private research centers and laboratories, large enterprises, and small and medium-sized businesses. 

Each Data Innovation Project will work on a challenge that requires data science ideas, approaches, and solutions. By taking on a convening and synergizing role, as opposed to directly conducting new research, the six Data Innovation Projects, called “Spokes,” will each gather important stakeholders, engage end users and solution providers, and form multidisciplinary teams to tackle large questions no single field can solve alone. However, unlike the Hubs, which aim to span the full range of data-driven challenges and solutions in a geographic region, each Spoke will have a specific, goal-driven mission. 

Seed grant projects are designed to give money to PI’s to establish communities of practice, working groups, or provide a connection point between two or more communities, sectors, or solution providers to grow and scale opportunities for the Southern region. 

 

PROJECTS IMPACT 

   - $6M+ in funding for large-scale Spoke Projects that impact the Southern region 
   - $250k per year available for Seed Grant funding 

 
All South Hub supported Data Innovation Projects - Spokes and Seed Grants - are listed below

The HBCU Data Science Consortium

Team Leaders: Jason Black, Thorna Humphries, Velma Latson, Ed Pearson, Felesia Stukes, Alfred Watkins
Collaborators: Florida A&M University, Norfolk State University, Bowie State University, A&M University, Johnson C. Smith University, Morehouse College
Priority Area: Data Science Education and Workforce

This project was an awardee of the 2020 South Big Data Hub SEEDS Program - Southern Engagement and Enrichment in Data Science. It was the single large award funded at $100K. The PIs seek to build a consortium that provides an accessible and beneficial platform within the HBCU community.


Extending Physics Outreach

Team Leaders: Adam LaMee
Collaborators: University of Central Florida
Priority Area: Data Science Education and Workforce

This project leverages existing K12 activities and partnerships to provide high quality data science outreach to minoritized high school teachers and teachers of minoritized students. This project can be a critical entry-point into data-related majors and careers for students underrepresented in those areas. 

The project will fund ten high school teachers local to the UCF area and not currently served by Quarknet who represent along with their students a greater racial and gender diversity than is typically served by these existing programs. The group of ten teachers will work together over five days with stipends of $1000 to learn and practice fundamental data science skills and design their own implementation plans to reach there ~150 students each year. These curricula will be incorporated into the large Quarknet set of teacher resources which reach 1000s of high school students annually. Training for teachers includes incorporating QuarkNet coding activities and establishing a partnership with IRIS-HEP.


Evaluation and Demonstration of Open Data Portal Technology for Smart Cities and Data Science for Social Good

Team Leaders: Seema D. Iyer, Michael McGuire
Collaborators: University of Baltimore, Towson University
Priority Area: Smart Cities and the Environment

The proposed project will provide open opportunities to learn about and evaluate a number of open data portal technologies for publishing data for smart cities and data science for social good. The proposed work will build on the Baltimore Data Science Corps NSF HDR project and will develop a number of use cases for serving open data for smart city and urban data applications. The use cases will then be demonstrated on a set of open data hub technologies such as the ESRI Open Data Portal, CKAN, DKAN, and DataVerse. The project will establish criteria to evaluate technology to publish open data and through established use cases that align with the South Big Data Hub priorities, technologies will be implemented and tested on their ability to provide end-to-end support for smart city and urban data science problems. The resulting data will be used in hackathons and competitions within Baltimore and results will be presented at the Baltimore Data Day workshop. The project will engage stakeholders at the South Big Data Hub and will be a model for publishing and using open data for smart cities and data science for social good. 


Data Science Next Steps

Team Leaders: Velma Latson, Ed Pearson
Collaborators: Bowie State University, Alabama A&M University
Priority Area: Data Science Education and Workforce

The project will host a summer academy for freshmen and sophomore students in underrepresented populations to become exposed to data science, computing technology, cybersecurity, and innovation. This project will assess how well students learn data science, computing technology, cybersecurity, and innovation in virtual instructional settings. The project is also interested in how the virtual instructional material can increase capacity in teaching data science, computing technology, cybersecurity, and innovation, at HBCU institutions. 


Data4Kids: Virtually Teaching Kids about Data Science

Team Leaders: Jonathan Schwabish, Claire McKay Bowen
Collaborators: Urban Institute and Southern K-12 organizations UTeach Outreach, K12 Education Manager, GoodGeo, K-12 Education Strategy, Policy, and Services, Launch Years Course Programs University of Memphis, Pathway.ai
Priority Area: Data Science Education and Workforce

This project was an awardee of the 2020 South Big Data Hub SEEDS Program - Southern Engagement and Enrichment in Data Science. It is one of two awards funded in the Seed Grant's mid-range category of >$50,000. The PIs seek to launch a website that contains guides, videos, and other content to assist educators and instructors on teaching data science to their students.


Visualization and Data Reuse Challenge — Digital Rocks Portal

Team Leaders: Maša Prodanović, James McClure
Collaborators: UT Austin, Virginia Tech
Priority Area: Data Science Education and Workforce

The visualization challenge will directly increase the capacity for data science education and workforce development. The online visualization mini-course will be followed by a visualization challenge with three categories (video, static image and 3D printed porous microstructure), and each prize will have three prizes judged by five faculty and researchers selected from the Southern Universities that have posted on or have been involved in Digital Rock Portal data curation. Students and young researchers from the US universities will get exposed to advanced scientific data visualization, curation and numerical simulation concepts by participating in the course and/or challenge. All products (mini-course videos, Jupyter Notebooks, scripts and derivative data) will be posted on YouTube, GitHub or Digital Rocks Portal as applicable, and thus create a permanent source of education materials. 


Reducing Uncertainty in Risk Projections for Statewide Hospital Capacity in COVID-19

Team Leaders: Richard Braun, Stephen Metraux, Olivia Wanjeri Mwangi
Collaborators: University of Delaware
Priority Area: Health Disparities

Enacting a statewide crisis response to the progression of COVID-19 requires a multiscale data collection, aggregation, and analysis strategy to circumvent variability associated with hyper-local effects of individual hospitals on one end and excessively coarse models based on national averages on the other. Consider the relationship between local health care systems in normal times. Beyond health care privacy laws and corporate rules, different hospital systems are friendly competitors, not used to sharing data about real-time functioning. In contrast, the extraordinary nature of the COVID-19 crisis required state wide coordination among Delaware hospitals to rapidly develop “shotgun” data agreements so that models can be fit with real-time data, balancing need and required effort. Integrating these data into models generates projections that inform constantly changing risk theory analyses, including the need for additional hospital facilities and medical supplies. With this proposal, we seek to develop a framework for data collection, aggregation, integration, and cooperation based on our Delaware case-study that can be used by hospital systems throughout the US to plan resources as the COVID-19 pandemic evolves as well as for future pandemics. The highly interdisciplinary project team consists of experts in Public Policy, Epidemiology, Physics, Computer Science and Industrial Engineering and has strong integration with local health systems in Delaware.


Dekalb Academy, Girl Scouts of America, and Scouts BSA Stem and AI in the Environment Program

Team Leaders: Allison Tomlinson-Grant
Collaborators: Dekalb Academy, Boy Scouts of America
Priority Area: Data Science Education and Workforce

Dekalb Academy and their partners will support approximately 100 students across four troops/pack in year-long projects and camps in STEM in conjunction with BSA NOVA awards. In addition, Dekalb will develop and scout leaders will implement a new “AI in the Environment” track, targeting the use of AI in environment and climate. 


Training Next-Generation Data Scientists in Non-Deterministic Scientific Data Generation

Team Leaders: Silvina Caino-Lores
Collaborators: University of Tennessee
Priority Area: Data Science Education and Workforce

The University of Tennessee team will develop and deliver training modules targeted at data scientists students looking to understand the impact of non-determinism in data science workflows. They will accomplish this by: 1) developing educational materials covering non-determinism and its impact on code executions affecting scientific data generation; 2) building use cases and software tools for the visualization and analysis of non-determinism; and 3) training data science students, professionals and researchers through tutorials and open educational materials. 


The North Carolina Data Science Panel Series

Team Leaders: Taylor Gibson
Collaborators: NC School of Science and Math
Priority Area: Data Science Education and Workforce

The goal of the NC Data Science Panel Series is to showcase the different educational pathways that high school students can take to become a data scientist and highlight the variety of career opportunities available in this field. Through a series of conversations and Q&A, students and teachers will have the opportunity to hear from real data scientists about their work and individual backgrounds.The North Carolina School of Science and Mathematics (NCSSM) will host the NC Data Science Panel Series events, attract participants, and reach out to their business/university connections to recruit speakers for the events. 


Human Resources Information Technologists and Data Science Core Research

Team Leaders: Kyle Huff
Collaborators: Georgia Gwinnett College
Priority Area: Data Science Education and Workforce

he project’s goal is to help move forward South Big Data Innovation Hub’s goal to increase capacity for data science education and workforce development by determining gaps in the data science preparation of students versus the workplace’s expectations for seven Business majors, thereby contributing to the South Hub’s clearinghouse of databases and best practices in the area of data science education. 


HBCU Data Science Student Ambassadors Initiative

Team Leaders: Jason Black, Thorna Humphries, Velma Latson, Ed Pearson, Felesia Stukes, Alfred Watkins
Collaborators: Florida A&M University, Bowie State University, Alabama A&M University, Johnson C. Smith University, and Morehouse College
Priority Area: Data Science Education and Workforce

Co-led by five leads from Southern institutions (Florida A&M University, Bowie State University, Alabama A&M University, Johnson C. Smith University, and Morehouse College), the HBCU Data Science Consortium will create a HBCU Data Science Student Ambassadors program, including student chapters at established HBCUs. The program will foster each campus data science community to promote education, research, inclusivity, and mentoring in data science. The initiative seeks to broaden data science participation. By establishing faculty-advised student data science organizations/clubs, this project seeks to strengthen student interest in data science and help train faculty in developing data science initiatives on their campuses. 


Empowering Southern Communities with a Smart Data Hub: A Case Study of
Homeownership Inequalities

Team Leaders: Arya Farahi
Collaborators: University of Texas at Austin
Priority Area: Smart Cities and the Environment

The University of Texas at Austin team will design and deploy an open-source and actionable Smart Data Hub (SDH) to harness society’s core services potential. The grant targets the housing sector, with a focus on homeownership. The SDH will: 1) democratize data science practices and make data and analytics accessible to and actionable for researchers, policy-makers, educators, and the public alike; 2) demonstrate an application of AI-enhanced policy evaluation in redressing inequalities in the age of big data; and 3) shape public opinion by democratizing data visualization, inference, and policy evaluation with a unified system. 


Warren Wilson College and a Data-Driven Community

Team Leaders: Holly Rosson
Collaborators: Warren Wilson
Priority Area: Data Science Education and Workforce

As one of ten federally-recognized work colleges, Warren Wilson seeks to establish a strong partnership between data science academic coursework and the College’s community engagement program. This grant will provide students real-world examples of how data science connects to professional or charitable work they do for their communities, both now and in the future. Students will work with community partner, Bounty and Soul, to: 1) combine data on population distribution, social vulnerability, and morbidity to best match their referrals’ nutritional needs and 2) indicate where high-speed internet is available in western North Carolina to target areas for possible online nutrition classes. 


Bridging Education in Big Data Neuroscience from the Midwest to the South

Team Leaders: Josiah Leong
Collaborators: University of Arkansas
Priority Area: Health Disparities

This University of Arkansas grant will fund travel for 16 students and 2 faculty speakers to the Advanced Computational Neuroscience Network (ACNN) Workshop. This workshop includes researchers from southern universities who are dedicated to advancing neuroscience research and education. 4 awards will go to doctoral or undergraduate students from HBCUs. Thus, the small grant funding will increase access to education in neuroscience and data science, and create partnerships between HBCUs and neighboring universities. 


Blockchain for Equitable and Sustainable Development Pilot Program for Appalachia

Team Leaders: Jason Xiong
Collaborators: Appalachian State University
Priority Area: Data Science Education and Workforce

The project will host a summer academy for freshmen and sophomore students in underrepresented populations to become exposed to data science, computing technology, cybersecurity, and innovation. This project will assess how well students learn data science, computing technology, cybersecurity, and innovation in virtual instructional settings. The project is also interested in how the virtual instructional material can increase capacity in teaching data science, computing technology, cybersecurity, and innovation, at HBCU institutions. 

 


Antiracist Research Workshop Proposal

Team Leaders: Dr. Trenette Goings
Collaborators: University of North Carolina at Chapel Hill
Priority Area: Health Disparities

The UNC-Chapel Hill team will hold a workshop to provide researchers antiracist application tools for health disparities research. Key principles of antiracist research will be explored, and activities and exercises will be created to engage participants in thinking about how these principles relate to their own research frameworks and approaches. Participants will engage in exercises to help them strategize how to solve problems they may experience when undertaking anti racist research. 


Training the Next-Generation Parallel and Distributed Programming Workforce with the Help of T-thinker

Team Leaders: Da Yan
Collaborators: University of Alabama at Birmingham
Priority Area: Data Science Education and Workforce

The University of Alabama at Birmingham lead will organize a series of events to introduce T-thinker to a broader audience in the US and around the world. As a promising next-generation programming model, students and big data practitioners should study the concepts and skills in T-thinker to stay competitive in big data careers. Domain experts in sciences and industry should also be aware of the new T-thinker technology to upgrade their big data/cloud processing pipeline. 


Data-Driven Streets in Washington, DC

Team Leaders: Karthik Balasubramanian
Collaborators: Data Driven Streets (volunteer collaboration between Howard U faculty Karthik Balasubramanian, American U faculty Chris Parker, and Meta Data Scientist Charlotte Jackson)
Priority Area: Smart Cities and the Environment

The Howard University team will educate traffic engineers on the true scope and depth of crashes in Washington DC. In addition, they hope to illuminate the effect of traffic calming and enforcement measures on outcomes of interest other than crashes, such as traffic volumes, speed, and people-throughput using a combination of public and private data about DC streets and gain clear insight into how infrastructure and policy decisions impact these measures. 


Building Equity Analysis Capacity with the Spatial Equity Data Tool

Team Leaders: Alena Stern, Sonia Torres Rodriguez
Collaborators: Urban Institute
Priority Area: Smart Cities and the Environment

The team at the Urban Institute will create a resource library to enable government and nonprofit organizations across the Southern United States to embed rigorous equity analysis into their work using Urban’s Spatial Equity Data Tool (SEDT) for equity analysis. The SEDT is a powerful, user-friendly tool that enables users to upload their own data and quickly analyze whether place-based resources – such as job training sites or greenspace – are equitably distributed across neighborhoods and demographic groups. The 0tool makes equity analyses that otherwise could have been time-consuming or cost-prohibitive broadly accessible to government and nonprofit stakeholders. The equitable distribution of resources, made more 0possible by consistent and standardized measurement, will address structural and systemic inequities by improving the lives of marginalized communities across neighborhoods and demographic groups. 


CyberAI: An Interactive Training Platform on AI and Cybersecurity for
Underrepresented Communities

Team Leaders: Uttam Ghosh
Collaborators: Meharry Medical College
Priority Area: Data Science Education and Workforce

The proposal team at Meharry Medical College will design and develop CyberAI, an open and interactive training platform on AI and Cybersecurity for underrepresented communities. CyberAI anticipates having training courses and modules designed for a diverse group of users. Some fundamental courses that target high school and community college students, some mid-level courses for middle and high school teachers, university students and professors, and some advanced level hands-on courses for community college faculties, post-docs, faculty from teaching-oriented universities who wish to incorporate AI & cybersecurity in their curriculum. CyberAI will promote cyber awareness and help the participants to learn the emerging field of AI and Cybersecurity. Initially the training is planned to be conducted from uploaded lectures, demos, and videos. The platform will be further extended to provide user manuals, assignments, one-to-one mentoring sessions, and live lecture sessions by experts and invited speakers. 


Catalyzing Data Innovation and Distributed Talent Management Cyberinfrastructure to Revolutionize the Public Workforce System in the U.S. South

Team Leaders: Julie Dunlap
Collaborators: Fathom
Priority Area: Data Science Education and Workforce

The project team at Fathom will convene workforce directors, providers, and clients for knowledge-sharing and participatory co-design sessions. The outcomes of these sessions will be utilized to create and disseminate a toolkit of recommendations, best practices, and open products for public workforce agencies to experiment and adopt distributed talent management technologies, processes, and infrastructure. In turn, this project will help to reduce barriers to data innovation across the region, promote the adoption of cyberinfrastructure and open products for talent management, and enhance the capacity of public sector data science education and workforce development. 


Equitable and Explainable AI in the Healthcare Space

Team Leaders: Korin Reid
Collaborators: Pauline Reid Research Foundation
Priority Area: Health Disparities

Dr. Korin Reid of the Pauline Reid Research Foundation will implement a small research project on Equitable and Explainable AI. Dr. Reid has developed numerous AI models for predicting chronic disease risk, and an open area of interest is determining what method is most suitable for ascertaining those models work effectively across all demographic groups. If models do not work effectively across all demographic groups, data science solutions pose the risk of exacerbating existing health disparities. Additionally, Dr. Reid will develop interactive clickable maps to display health disparity and population health information by region and demographic group with granularity as low as the county level. 


The Open Storage Network

Team Leaders: Michael Norman (PI), Stanley Ahalt, James Glasgow; John Goodhue; Alexander Szalay
Priority Area: Data Sharing and Cyberinfrastructure

OSN provides value to multiple disciplines, ranging from moving large astronomy datasets to compute resources to datasets geared towards strengthening machine learning research. Our users constitute a varied community of researchers and practitioners. The project is nationally coordinated effort led by the South Big Data Hub and supported by the Northeast, West, and Midwest Big Data Hubs.


Large Scale Medical Informatics for Patient Care Coordination and Engagement

Team Leaders: Gari Clifford (PI), Christopher Rozell, Herman Taylor, Donald Adjeroh, Ahmed Abbasi, Nitin, Agarwal
Collaborators: Emory University, Georgia Tech, Morehouse School of Medicine, UT Dallas, West Virginia University, University of Virginia
Priority Areas:Health Disparities 

This project brings together six universities to design and construct a patient-focused and personalized health system that addresses the fractured nature of healthcare information and the lack of engagement of individuals in their own healthcare. As its first pilot, the researchers will focus on African Americans and Hispanics/Latinos diagnosed with cardiovascular disease.


Smart Grids Big Data

Team Leaders: Mladen Kezunovic (PI), Santiago C. Grijalva, Zoran Obradovic
Collaborators: Texas A&M University, Georgia Tech, Temple University
Priority Area: Smart Cities and the Environment

The introduction of new technologies for monitoring electrical power grids has led to an abundance of data that can be used to improve power generation and transmission and to enhance customer service. However, this data is still vastly underutilized. This project aims to increase our understanding of the merged data collected from physical systems in order to better understand how energy flows through grids, how to prevent emergencies such as blackouts and brownouts, and how to improve asset management and increase energy efficiency.


DEDICATE: Data Science Equity-Driven Inquiry to Create Accessible Project-based Training for Social Impact Education

Team Leaders: Renata Rawlings-Goss, Marc Boumedine, Uma Ravanasamudram, Earvin Balderama
Collaborators: Georgia Tech, University of Virgin Islands, North Carolina Central University, Cal State-Fresno
Priority Area: Data Science Education and Workforce

DEDICATE: aims to solve some of the challenges to broadening participation in STEM at a systems-level, specifically 1. Instructor-designed “ projects” not based in Project-Based Learning (PBL) Research; 2. Silos of Projects across STEM and Non-STEM disciplines; 3. Lack of Faculty-Centric Design; and 4. Lack of Systemic-Change Networks. DEDICATE will engage up to 90 faculty across 30 faculty teams to train over 3,000 students, across the Southern census region of the U.S. and California which covers over 90 of the 101 HBCUs and 163 of the 451 HSIs.


Enhanced 3-D Mapping for Habitat, Biodiversity, and Flood Hazard Assessments of Coastal and Wetland Areas of the Southern US

Team Leaders: Frank Muller-Karger, James Gibeaut, Timothy Dixon
Collaborators: University of South Florida, Texas A&M - Corpus Christi
Priority Area: Smart Cities and the Environment

The vision of this project is that communities occupying low-lying coastal areas of the southern US will be protected and develop in a sustainable manner through planning based on knowledge, conservation, and wise use of sensitive lands. Researchers from the University of South Florida’s College of Marine Science and the School of Geosciences at Texas A&M University – Corpus Christi, along with Google Earth Engine are collaborating with the South Big Data Hub through this project to develop more accurate, ultra-high resolution topographic, land cover, and urban environment geospatial products. The project examines in detail areas that were directly impacted by Hurricanes Harvey and Irma in 2017 and identifies flood-prone areas across the region.


Smart Privacy for Smart Cities: A Research Collaborative to Protect Privacy and Use Data Responsibly

Team Leaders: Jules Polonetsky
Collaborators: Future Privacy Forum
Priority Area: Smart Cities and the Environment

The long-term vision of the project is to help municipal leaders strengthen their ability to collect, use, and share data in a responsible manner. This will help grow privacy-preserving innovations across applications and geographic boundaries for the public good. In this way, the Smart Privacy for Smart Cities Spoke will serve to increase public knowledge, understanding, and engagement with privacy-related concerns, and ultimately, to promote the public’s trust in smart city technologies and in their local government.


Integrating Biological Big Data Research into Student Training and Education

Team Leaders: Hon Qin, Donald Adjeroh, Mentewab Ayalew, Fan Wu, Azad Hossain, Yu Liang, Joey Shaw, Hope Klug, Jennifer Boyd
Collaborators: University of Tennessee Chattanooga, West Virginia University Research Corporation, Spelman College, Tuskegee University
Priority Area: Data Science Education and Workforce

The project is a collaborative effort among the University of Tennessee Chattanooga, Tuskegee University, Spelman College, and West Virginia University to integrate and automate biological big data into student training and education. The project will offer training workshops, engage faculty and students in developing a protocol to automate field data collection, and will prototype automated methods to enhance plant digitization.


Keeping Data Science Broad

Team Leaders: Renata Rawlings-Goss
Collaborators: Georgia Institute of Technology
Priority Area: Data Science Education and Workforce

The goal of this series is to garner community input into pathways for keeping data science as a discipline broadly inclusive. We seek input from data science programs in any region across the nation, either traditional or alternative, and from a range of institution types including minority-serving institutions, community colleges, liberal arts colleges, tribal colleges, universities, and industry partners.


Using Big Data for Environmental Sustainability: Big Data + AI Technology = Accessible, Usable, Useful Knowledge

Team Leaders: Ashok Goel, Jennifer Hammock
Collaborators: Georgia Institute of Technology
Priority Area: Smart Cities and the Environment

As the effects of environmental degradation and climate change grow, the need for research and education in biological diversity, ecological modeling and environmental sustainability becomes critical. This project brings together scientists from a dozen institutions in academia, government, and industry to translate big data into meaningful knowledge that supports research and education in environmental sustainability. The project will focus on the Encyclopedia of Life (EOL), the world’s largest database of biological species, and other biodiversity data sources. Its goals are to make this data more accessible and usable by integrating artificial intelligence tools, modeling, and simulation.