Accelerating the Pace of Big Data

Just how Big is Big Data? It’s difficult to wrap our heads around it. We now carry in our pockets computers (a.k.a. smartphones) that have 1 million times more memory than NASA’s Apollo Guidance Computer, which was used to land the first human beings on the moon. The world’s most powerful supercomputer is the Summit housed at the Oakridge National Laboratory. It can perform 200 quadrillion (1 x 1015) calculations per second. 

It’s an indescribable number, somewhere between the number of ants alive on earth at one time and the number of grains of sand on earth. Suffice it to say, today’s researchers have some freakin’ amazing computer power at their disposal.

The truth, however, is that many scientific fields have yet to realize the potential of Big Data. That’s where the National Science Foundation’s (NSF) Big Idea, Harnessing the Data Revolution (HDR), comes in. This $30 million initiative is among 10 Big Ideas identified by NSF in 2017. Projects funded by this initiative are doing everything from improving our chances of detecting dark matter to resolving the tree of life

Coordinating the building of a national data infrastructure, sharing best practices and identifying gaps that need to be addressed by the community is no small effort. In April, 160 researchers — all of whom are Principal Investigators on HDR-funded projects — gathered online for the KI-facilitated 2020 HDR All-hands Meeting. Organizers from the NSF Big Data Innovation Hubs converted the meeting, originally scheduled to take place in the Washington, D.C. area, to a virtual event due to the COVID-19 pandemic.

“We want this to be a connected ecosystem so that results and findings and expertise that are generated in one project really can be circulated and built upon,” says Renata Rawlings-Goss, Ph.D., Executive Director of the South Big Data Innovation Hub (SouthBDHub), one of four regional Hubs across the country. “This meeting was about identifying projects, community building and forming collaborations — especially among those from disciplines that don’t normally work together.”

Tufts University’s Lenore Cowen, Ph.D., was impressed by the number of collaborators she was able to meet at the virtual meeting. “We are already pursuing some of these collaborations and others we are just beginning to follow-up on as the semester ends, summer begins and we have more time to talk further,” says Cowen, Professor of Computer Science and Director of T-Tripods.

Given the fairly recent start of the HDR Big Idea, Cowen says it was important not to cancel this meeting. “This is very new territory and the community got a lot out of it.”

The University of Utah’s Chris Meyers, Ph.D., says the meeting was perhaps even better virtual than in-person. “It was very seamless to move from discussions. The KIStorm software was great. I plan to use this as a model for future virtual meetings,” says Meyers, who is a Professor of Electrical and Computer Engineering.

Anastasios Sidiropoulos, Ph.D., agreed. “I was hoping to meet new colleagues, exchange ideas, and learn interesting research directions that the HDR projects are pursuing. My expectations were met fully,” says Sidiropoulos, an Associate Professor of Computer Science at the University of Illinois, Chicago. “I think virtual meetings like this are the future of scientific discourse.”

By Camille Mojica Rey, PhD, Science Communications Director, Knowinnovation

Spoke Project Addresses COVID-19 with VERA

The NSF Spoke Project ‘Using Big Data for Environmental Sustainability: Big Data + AI Technology = Accessible, Usable, Useful Knowledge’ has repurposed VERA to model the effect of social distancing on the spread of COVID-19, including the SIR model of epidemiology. VERA enables a user to build conceptual models and agent-based simulations, and conduct “what if” virtual experiments.

Read the white paper abstract below:

COVID-19 continues to spread across the country and around the world. Current strategies for managing the spread of COVID-19 include social distancing. We present VERA, an interactive AI tool, that first enables users to specify conceptual models of the impact of social distancing on the
spread of COVID-19. Then, VERA automatically spawns agent-based simulations from the conceptual models, and, given a data set, automatically fills in the values of the simulation parameters
from the data. Next, the user can view the simulation results, and, if needed, revise the simulation parameters and run another experimental trial, or build an alternative conceptual model. We describe
the use VERA to develop a SIR model for the spread of COVID-19 and its relationship with healthcare capacity.

View the project and white paper ‘VERA_Epidemiology – White Paper 1: Using VERA to explain the impact of social distancing on the spread of COVID-19 HERE

(CLOSED) The Program to Empower Partnerships with Industry and Government (PEPI-G) Applications CLOSED

Applications are closed!

PEPI-G supports data faculty members, research scientists, postdocs, and graduate and undergraduate students from across the country in working on high level problems for the federal government. Our 2020 program partner is the Department of Homeland Security – Advanced Research Projects Agency (DHS-ARPA).  

Qualifying applicants must be: (1) a US CITIZEN, (2) an academic professional (i.e. faculty, post-doctoral researcher, research scientist, graduate student, or rising junior and senior undergraduates), (3) able to pass a DHS background check (suitability) and (4) able to live and work in Washington D.C. for the duration of the fellowship.

Applicants can request 3-6 months for the fellowship.

Selected individuals will receive a stipend of $5000/month to off-set travel costs for relocating to Washington DC. 

For questions feel free to contact South Big Data Hub Program Coordinator, Kendra Lewis-Strickland at klewis-strickland@gatech.edu.

(CLOSED) DataUp Program 2020 Cohort Applications

The South Big Data Innovation Hub is excited to host the 2020 DataUp Program. DataUp will offer hands-on training for instructor teams at minority-serving institutions, community colleges, or 4-year liberal arts colleges. Priority will be given to hosts who can demonstrate the participation of faculty from diverse departments, or multiple institutions of the types listed above.

Applications Close March 31! (Applications Closed)

Applicants must be groups of 2-4 faculty or permanent staff from a minority-led, -serving, primarily teaching institutions, community colleges or 4-year liberal arts colleges institution or group of institutions in the same local area to maintain a cohort model in each location. Lead Institutions must be in the 16 states of the South Huband indicate interest in hosting a training workshop at their institutions. To ensure broad and adequate attendance, in the application process, institutions will submit a list of potential attendees or partners from their local communities which should include local minority-serving institutions and community colleges. Institutions with the broadest reach will be selected for participation.

Interested? Learn more about the 2018 cohort’s experiences at the DataUp webpage.

iCompBio REU and Workshop Provides Training to Students and Faculty

In the week of July 29-Aug 2, 2019, more than 50 faculty and students from more 21 institutions participated in two R bootcamps at the University of Tennessee at Chattanooga (UTC). The iCompBio REU is supported by NSF Award 1852042,  REU Site: ICompBio – Engaging Undergraduates in Interdisciplinary Computing for Biological Research. The first bootcamp on data wrangling using R was taught by Hong Qin, a computational biologist at UTC. Materials for this R Data Wrangling bootcamp is available at a public GitHub repository https://tinyurl.com/UTC-R-camps2019. The second bootcamp, Electronic Health Records, was taught by Elvena Fong and Zhuqi Miao from the Center for Health Systems Innovation at the Oklahoma State University.

For discussion on research and education in bio big data, please join a LinkedIn group at https://www.linkedin.com/groups/12279083/

Participants from the first bootcamp on data wrangling using R taught by Hong Qin, a computational biologist at the University of Tennessee at Chattanooga (UTC). The bootcamp is supported by the NSF Big Data Spoke award, Integrating Biological Big Data Research into Student Training and Education.

Participants from the second bootcamp, Electronic Health Records, taught by Elvena Fong and Zhuqi Miao from the Center for Health Systems Innovation at the Oklahoma State University. The bootcamp is supported by the NSF Big Data Spoke award, Integrating Biological Big Data Research into Student Training and Education.

Biological REU

From May 27 to August 5, 2019, a group of 12 students participated in a 10-week Interdisciplinary Computing for Biological Research REU program at the University of Tennessee at Chattanooga. These undergraduate researchers are from Fisk University, Tuskegee University, Morehouse College, Norfolk State University, University of Virgin Islands, Tennessee Technological University, Rhodes College, Shippensburg University of Pennsylvania, the University of Tennessee at Chattanooga. The majors of these students include 3 Mathematics, 2 Chemical Engineering, 3 Biology, 1 Biochemistry, 2 Computer Science, and 1 Computer Engineering. The iCompBio19 includes a total of 8 faculty mentors 2019 that come from Computer Science, Mathematics, Biology, Geology, and Chemical Engineering.

All students presented their REU research results at a poster symposium on July 31.

The 12 student participants in the 10-week Interdisciplinary Computing for Biological Research REU program at the University of Tennessee at Chattanooga. The REU was supported by the NSF Big Data Spoke award, Integrating Biological Big Data Research into Student Training and Education.

The 12 student participants of the 10-week Interdisciplinary Computing for Biological Research REU program presenting their REU research results at a poster symposium on July 31. The REU was supported by the NSF Big Data Spoke award, Integrating Biological Big Data Research into Student Training and Education.

2019 PEPI-G Fellows Selected

The Program to Empower Partnerships with Industry and Government (PEPI-G) supports faculty members, research scientists, postdocs, and graduate and undergraduate students (rising juniors and seniors as of 2019) from the 16 states that comprise the South Big Data Regional Innovation Hub (South BD Hub).

2019 Fellows

James Stevenson is an undergraduate student at Northern Kentucky University and is currently pursuing his degree in Information Technology with his focus being Cybersecurity. He’s a technologist at heart and enjoys everything related to cyberinfrastructure, social cybersecurity, the internet of things, and data manipulation. His goals for his senior year of college are to gain professional experience in his career field and to develop his technical skills. This fellowship provided by the Big South Data Hub will allow him to reach these goals.

Rachel St Clair is a doctoral student at Florida Atlantic University studying Complex Systems and Brain Sciences. Rachel’s main focus centers in multi-modal, translational machine learning in complex systems and brain sciences. Her background in both medicine and biology helps structure the integration of machine learning models for both academia and industry applications. Previous work involves a variety of research fields including mental disorder diagnosis, epileptic mice investigations, and synthetic drug detection. Drawing from interdisciplinary experiences drives her current integrative research in deep learning proteomics, computer vision, and therapeutic XR platforms. Her future accomplishments aim to include advancements in advanced machine perception and general AI. Rachel notes, ‘working with others who care deeply for the evolution of computerized cognitive task and their role in making the world a safer place would be a defining historical moment in my career path’.

2019 Partner

The Department of Homeland Security – Advanced Research Projects Agency (DHS-ARPA) DA-E lab infrastructure consists of industry-standard servers and network gear, custom appliances built on the premise, and commercial and private cloud capabilities. 

DHS’ identified Priority Areas:

  • Human Trafficking – Examining social media to aid in the fight against human trafficking focusing on Non-Text Data, Automating Search and Scalability
  • Real-time Analytics for Multi-party, Metro-scale Networks (RAMMMNets) – Data associated with the Internet-of-Things presents challenges to the analytic environments that inform human decision making.
  • Other Topics – Faculty fellows may propose other research topics for consideration.

Online Training with STIPEND SUPPORT Opportunity: ‘Big Data + High-Performance Computing + Atmospheric Sciences’

Call for Participants: NSF fundedMultidisciplinary Online Training Program with Stipend Support in Spring 2019on Big Data + High-Performance Computing + Atmospheric Sciences

   Funded as an NSF grant to train graduate students, post-docs, and junior faculty on “Big Data + High-Performance Computing + Atmospheric Sciences”, our training program is a new NSF-funded initiative in big data applied to atmospheric sciences and using high-performance computing as a vital tool. The training consists of instruction in the areas of data, computing, and atmospheric sciences supported by teaching assistants, followed by faculty-guided project research in a multidisciplinary team of participants from each area. Participants around the nation will be exposed to multidisciplinary research experiences and have the opportunity for significant career growth.  Continue reading

The DataUp Workshop – Instructor Training: Inspiring Professional Development & Capacity-Building

Faculty teams from the DataUp program during the Instructor Training Workshop on Nov 6 & 7, 2018.

Society is increasingly becoming more data-driven and data-literate. It is vital every institution has the capabilities and infrastructure to engage and develop learners prepared to interact and succeed in such a society. Numerous studies have identified the expanding data divide between institution types and the need to develop successful bridge initiatives. The South Hub begin to address this need by creating a 3-part program, DataUp. Through this program, the South Hub is directly impacting each participating institution’s data science education capacities.

The first component of the program is a hosted 2-day data or software workshop presented by the Carpentries. This provided an opportunity for each participating institution to engage in a workshop that specifically addressed their data knowledge gaps (for more information on these workshops, Click Here). Exposing students to these intensive workshops, students are able to gain hands-on training and exposure to principles and tools, such as shell and JupyterHub. Removing the associated ‘fear factors’ empowers learners to employ and address challenges with data. The second component of the DataUp program is a 2-day pedagogy intensive instructor training.

Continue reading

Strategies for hiring and maintaining a diverse data scientists workforce

RTI’s Kristina Brunelle (left) moderates a panel discussion with Amy Roussel, RTI (center); Gracie Johnson-Lopez, Diversity and HR Solutions (right); and Sackeena Gordon-Jones, Transformation Edge and NC State University (on screen).

Data science is hot. That’s good news for workers with data science skills. It also means organizations competing to hire data scientists need to understand how to recruit talent that will solve their data science challenges and contribute to creating a productive and diverse workforce.  Continue reading