NSDL reaches out to individuals and organizations by exhibiting, attending and presenting at national and international STEM meetings and conferences. Read current first-hand reports about NSDL-on-the-road including photographs!


Contributors:

AND. . . the Zia 2008 NSDL Annual Meeting Haiku

National Science Ditial Library projects, partners and pathways met in Washington, D.C. Sept. 30-October 2, 2008 to discuss new directions for the 8-yr-old NSF initiative designed to leverage online educational STEM opportunities for students and learners of all ages nationwide. Many partners and projects have been part of NSDL since 2000, and as usual, the conversations and collaborations in and around sessions were highly-valued by attendees as a way to cathch up professionally and personally.  A brief report from NSDL’s Annual Meeting earlier this week featuring community-contributed Haiki can be found here.  As is traditional, here are NSF Program Director Lee Zia’s (currently on leave from NSF) annual poetic musings. You can view past Zia Haiku here.

Let content be free;
Create value with context.
Service you can sell!

When nouns become verbs:
To google, or go ogle?
The hit list beckons.

Who learns what, and how?
Social graph meets concept map.
Make my dream come true!

Change is in the air.
But the learner still comes first;
Begin with the end.

Views are author’s own.
Not official policy
Of the NSF.

Posted in Topics: Annual Meeting 08, Education, Science, Social Studies, Technology

Add a Comment »

NSDL Annual Meeting 08: Where in the World is Fedora

blog_map.jpg

Since 2000 National Science Digital Library (NSDL) projects and partners have demonstrated multiple ways to provide high quality resources and tools that support innovations in teaching and learning. This year’s NSDL Annual Meeting promises to highlight even more of these ongoing cyberlearning initiatives that in total have increased the educational value of national investment in digital library initiatives. Global institutional library and archive communities are interested in what NSDL has learned over the last eight years–how to effectively re-purpose both technology and information for K16 educational audiences.

Fedora Commons is one such knowledge community. More than half of the Fedora Commons global community of users and developers are from large international, public or academic libraries and archives. As a member of the Fedora Commons community NSDL is one of many organizations that rely on Fedora Commons open source repository software to create an underlying architecture for systems like NCore, the suite of technologies and standards that are a framework for NSDL’s digital library infrastructure.

The authors of The Academic Library in a 2.0 World (1), a research bulletin published by EducauseConnect, have suggested that libraries will increasingly be called on to prove their value to learning, teaching, and research by demonstrating tangible outcomes and evolving their structures, processes, services, and staff roles to accommodate the changes occurring in publishing and communication.

NSDL-writ-large has developed expertise in re-purposing and delivering institutional knowledge for teaching and learning since 2000. Partnering with NSDL projects is one answer to how libraries and archives in many parts of the world might begin to address an outreach mission that is new to some of them.

As one UK institutional archivist and librarian once said to me, “The idea that we should be in the business of marketing, creating new products, and providing open access to what’s inside of formerly well-guarded fortresses of knowledge is new for some of us.”

Take Oxford, for example.

Recently Sarah Thomas, The 26th Bodleian Librarian (that would be 26th over the last 9 centuries), and Director of Oxford University Library Services at Oxford University, addressed former colleagues at Cornell University about the differences between what she described as Oxford’s very old institutional library system (900 years, 10,000 medieval manuscripts, and four Magna Cartas) and Cornell’s “young” library system (about 150 years, one Magna Carta and a handful of medieval manuscripts). I was struck by the vast cultural differences she described between these two venerable library systems located across an ocean from one another, but with the same closely held ideals for institutional support of scholarship and learning.

The University of Oxford is rich in daily academic ritual among well-known buildings designed by early architects like Sir Christopher Wren. Time is even reckoned differently. Three-month long “terms”—Michaelmas Term, Hilary Term, and Trinity Term create a scholarly pace of life where many students and faculty ride bikes from place to place, and is in tune with the idea of quiet study over long periods. Academic ritual at American universities is more likely to involve loud music, multi-tasking, media and access to online social networks that change at the speed of a keystroke.

The 30,000 external and 60,000 internal Oxford registered “readers”, who we might understand as “users”, like printed materials, a lot. Many of Oxford’s library systems deliver physical books to people who are in library buildings because almost no one is allowed to take a book out of one of Oxford’s many libraries, and yet because Oxford is a UK legal depository (a copy of every publication, electronic and other non-print material are required by law to be deposited in a national library to ensure that this information is available for future generations). Thomas feels an obligation to people throughout the UK to make Oxford’s valuable knowledge resources more accessible. A small percentage of Oxford’s collections are digitized and internet access is not always available on campus. She is committed to making the needs of users, and readers, a part of the Oxford Libraries’ tradition in the future.

She concluded with the question, “How do we go forward and benefit from this remarkable past?”

Part of the answer lies in Fedora-based projects at Oxford like Forced Migration Online (FMO), a project coordinated by a team based at the Refugee Studies Centre, Oxford Department of International Development (QEH), University of Oxford. FMO aims to give comprehensive information in an impartial environment and to promote increased awareness of human displacement issues to an international community of users.

To find out more about the Fedora Commons community of libraries and archives in some surprising places and cultures worldwide come to “Fedora Commons Educational Repository Projects” at NSDL’s Annual Meeting, October 1, from 3:30-4:00 p.m. in the Capital Room, Omni Shoreham Hotel, Washington. D.C.

(1) Wawrzaszek, Susan, and David G. Wedaman. “The Academic Library in a 2.0 World” (Research Bulletin, Issue 19). Boulder, CO: EDUCAUSE Center for Applied Research, 2008, available from http://www.educause.edu/ecar.

Posted in Topics: Annual Meeting 08, Education, Fedora, Repositories, Social Studies, Technology

Add a Comment »

“Sharing Evaluation Expertise and Results” Annual Meeting Session

Wondering how to design an evaluation with limited or no funding? Have questions about what to do with the data you’ve been collecting but don’t know what to do with? Bring your questions to the 1:15 session on Wednesday Oct. 1, 2008 (Sharing Evaluation Expertise and Results). We’ll discuss evaluation and take a stab at answering your questions.

Posted in Topics: Annual Meeting 08, Social Studies

Add a Comment »

NSDL Annual Meeting 2008

This year’s Annual Meeting with the theme “STEM Research and Education in Action” will be held at the Omni Shoreham Hotel in Washington, D.C. from September 30 - October 2, 2008.

The schedule includes presentations and overviews from NSDL Pathways and partners who are implementing NSDL resources and tools for K12 and undergraduate classrooms nationwide. Stay tuned to this blog for up-to-date meeting notes and preview information.

Posted in Topics: Annual Meeting 08, Education, Mathematics, Open Source, Repositories, Science, Social Studies, physics

Add a Comment »

American Association of Physics Teachers (AAPT) Summer Meeting

Screenshot from SciQ channel “Invisibility. Teleportation. Mind reading. Psychokinesis. Time travel. Star ships. Parallel universes. Normally, these would be dismissed by scientists as being impossible. One hundred years ago, the same was thought about lasers, televisions and visiting outer space.” This is the first sentence of Michio Kaku’s abstract from his presentation to AAPT entitled, “Physics of the Impossible.” Dr. Kaku can also be seen on Sci Q Sundays every Sunday night.

By Pat Viele, Physics & Astronomy Librarian for the Edna McConnell Clark Physical Sciences Library at Cornell University

Edmonton, Alberta, site of the AAPT Summer Meeting (http://www.aapt.org/Events/SM2008/index.cfm) held from July 19-23, 2008, was very colorful with huge fields of bright yellow canola plants (used to make a type of edible cooking oil) in full bloom. The plants only flower for about one week, so our timing was good.

First on my agenda was to give my “Mining the Hidden Web” tutorial. I always enjoy these sessions and learn from those who participate. I then spent time with David Jones, who has just taken on the position of liaison to the physics department at the U. of Alberta. The science and technology library at the U. of Alberta is currently undergoing renovation taking advantage of one of its two seasons: winter and construction.

Michio Kaku, faculty at the City College of New York and author of “Physics of the Impossible” spoke on that topic. He will host a TV (Sci Q Sundays) series this month. For details, see: http://science.discovery.com/tv/sci-q/about/about.html. Michio is a very engaging and entertaining speaker whose talents can be experienced on Youtube: http://www.youtube.com/watch?v=PW8rgKLPHMg

As chair of the Committee on Professional Concerns, I facilitated the committee meeting, attended sessions that the committee sponsored, and attended Programs I and Programs II meetings to help plan the next national meetings.

The session “Graduate Education in Physics: Which Way Forward?” was a follow-up session to the conference itself which was held in late January at the American Center for Physics. I have great interest in the recommendation of the Task Force on Graduate Education in Physics that physics graduate students be offered instruction in information fluency.

The session “Rethinking the Upper-Level [physics] Curriculum” was arranged by Cornell University alum Dr. Ernie Behringer. Attending sessions like this helps me learn more about what faculty and students need at various levels in my work with Cornell faculty and grad students.

“Scientific Communication and Writing” was an excellent session. I was especially impressed with the program developed by husband and wife team of Dr. Dan Budny, U. of Pittsburg, and Dr. Teresa Larkin, American University. In collaboration with librarians, the English department, and the writing center, they have developed an excellent writing program for engineering and physics students. At the end of the year, the students work in groups to prepare papers for their own conference. Details are here: http://www.engr.pitt.edu/%7Eeng11/.

As always, the physics demonstration show was both entertaining and informative. After the conference I traveled to the Canadian Rockies’ Jasper National Park–the trip was spectacular.

Posted in Topics: Education, Mathematics, Repositories, Science, Social Studies, Technology, physics

View or Add Comment (1) »

RepoCamp at the Library of Congress

If you cross Amazon.com CTO Verner Vogels’, “Two Pizza Team Rule” with what David Flanders, Project Manager, The Bloomsbury Colleges, and organizer of the summer of 2008 “Repository Road Shows,” compares to “Penny Universities” of the 18th century that were often convened in taverns, or to the work life of Shakespeare who more than likely developed his best collaborative plays around a pub table, you will get a ‘RepoCamp.’ The open, non-territorial, and thought-provoking slogan for this series of events for knowledge managers of every stripe—“The coolest thing to do with your data will be thought of by someone else”—was interesting enough to inspire about 25 people to attend RepoCamp at the Library of Congress in Washington, D.C. on July 25, 2008.

The idea for RepoCamp came out of UK “BarCamp” and “Unconference” events sponsored by the JISC Common Repository Interfaces Group (CRIG). Unstructured, rapid prototyping events are designed to speed up on-the-fly innovation. Instead of spending time in meetings discussing possibilities RepoCamp participants quickly explain ideas and write code together in a friendly environment.

A typical day at RepoCamp goes something like this: Sharing five minute “elevator pitches” loosely based on what’s currently inspiring or bothering participants about managing, developing or running a repositories; self-organizating around flip charts with notes from pitches so that people can gather to contribute insights around particular ideas; ad hoc prototyping with selected “gurus” who coordinate progress and help grab services off the web; sharing conclusions with a new round of elevator pitches based on outcomes that can include step-by-step paper-prototypes, working interfaces or brand new ideas. The real RepoCamp wrap-up is traditionally conducted at a local bar where the best ideas seem to emerge. “Let’s make a lot of mistakes and make them fast,” is an often-repeated RepoCamp direction says Flanders. More challenging issues such as scalability, robustness, and interoperability are post-RepoCamp fodder.

CRIG takes an inclusive view of knowledge management in interfacing repositories with other services. Rachel Bruce, founder of CRIG along with Rachel Heery, observes, “These issues are global and not something to be dealt with solely within national boundaries.” The opportunity to reach out to developers in the U.S. to create solutions with a series of RepoCamp events grew out of collaborations that were already taking place with DSpace, EPrints, Fedora developers.

Sandy Payette, Executive Director of Fedora Commons, home of Fedora open source repository software, sees RepoCamp and other emerging programming events as being particularly useful for developers who enjoy social networking around things that matter to them—most often problem solving that leads to rapid prototyping. “It’s a way to gauge interest on-the -spot,” She said.

The DSpace Foundation and Fedora Commons recently announced plans to collaborate based on meetings held this spring where members of DSpace and Fedora Commons communities discussed multiple dimensions of cooperation and collaboration between the two organizations. Ideas included leveraging the power and reach of open source knowledge communities by using the same services and standards in the future. The organizations will also explore opportunities to provide new capabilities for accessing and preserving digital content, developing common web services, and enabling interoperability across repositories.

JISC CRIG saw this and other community efforts towards achieving greater integration and interoperation as an opportunity to host the U.S. Repostiory Roadshow that wrapped up at the Library of Congress on July 25. The JISC CRIG team would like to extend thanks to Ed Summers for arranging the LOC venue.

JISC will sponsor an academic developer-focused event in 2009 that will utilize RepoCamp ideas (Flanders suggests, for example, that the conference dinner might be something like a a massive video game party) to continue to work towards that elusive but worthy goal of “interoperability” by building relationships among developers and programmers across academia. Look for an official announcement of JISC’s “Developer Happiness Days” early in 2009.

Posted in Topics: Education, Fedora, Open Source, Repositories, Science

Add a Comment »

Reality Check: SIGGRAPH 2008

Los Angeles is a town where reality is a reinvented on a daily basis. Even so the exhibits, talks and media presented at SIGGRAPH 2008 pushed the limits of perceived reality with a provocative theme exhorting participants to “Evolve.” However the almost 30,000 graphics and robotics researchers, entertainment industry representatives, educators, programmers, artists and students from 87 countries who attended the 35th International Conference and Exhibition on Computer Graphics and Interactive Techniques held at the Los Angeles Convention Center August 11-15, 2008 appeared to be mostly human. Other already-evolved types of attendees such as Quasi the Robot were clearly distinguishable, which may not be the case in the future.

film still imageThis rich atmospheric still image is from “Nature Tzu-jan” by Ari Rubenstein, Curv Studios.

As the field of computer graphics has advanced over the last quarter of a century the simulation of reality has come into its own as an art form. SIGGRAPH 2008 artists and programmers presented computer graphics imagery that embodied inherent aspects of the medium such as exposed wire frame underpinnings and subtle textures, just as the qualities of paint, ink, stone and clay have always been used to express thoughts, feelings and ideas.

Animation Mother imagesThe graphic icon that was used on posters and signage throughout SIGGRAPH is a 3D holographic, machine-like image entitled “Animation Mother” by Meats Meier. The being appears human-like and yet is composed of distinctly recognizable computer animation elements.

The SIGGRAPH Computer Animation Festival featured a sensory cornucopia of screenings representing a wide range of media examples from all over the world. The event was hosted by Pixar Animation Studios, Sony Pictures Imageworks, and Lucasfilm in the state-of-the-art NOKIA Theater.

Ed Catmull, President, Walt Disney and Pixar Animation Studios, opened the conference with a look back over his pioneering career in managing work groups who were responsible for creating groundbreaking full-length animated films such as “Toy Story” and “Finding Nemo.”

RenderMan® is a Pixar high quality rendering product used for making feature films that was collaboratively developed by engaging a community of digital effects and computer graphics companies. Catmull said, “It has been a standard for 20 years.” He continued, “We set a complexity goal that we thought was impossible and have far exceeded our goals.” The success of the Renderman® development process demonstrated the benefits of open sharing and decision-making.

There is a lot of pressure to ‘get it right the first time’ when making feature films because mistakes are so expensive. Production managers are sometimes seen as a roadblock to artists and programmers who are actively involved in creative processes. Catmull emphasized that communication in a creative environment should happen between anybody at anytime. Catmull believes that balancing this complex work culture was made easier because early animation teams believed that they were making history.

On the heels of Catmull’s talk the Walt Disney Company announced that they will open a research and development lab at Carnegie Mellon University to engage top technology for its theme parks, television networks and animation studios. The Disney Research Pittsburgh Lab is scheduled to open this fall.

Rome market renderingThis is a computer rendering of an ancient Roman market area from the Rome 1.0 model that visitors were invited to ‘walk through’ at the Rome Reborn exhibit booth.

Of particular interest to those interested in new ways to interact with complex data was the Rome Reborn multimedia exhibit at SIGGRAPH. The exhibit was the result of an international collaboration led by The Institute for Advanced Technology in the Humanities at the University of Virginia that includes industry partner IBM, and the German Archeological Institute, the Universite de Michel de Montaigne-Bordeaux III, the Universite de Caen, the Politecnico di Milan and UCLA. These rich data sets (Rome 1.0 and Rome 2.0) depict Rome as it might have appeared in A.D. 320. The models includes hills, valleys and water features of the city where over one million people lived and worked in 7,000 located and identified buildings. Significant architectural, political and social structures and monuments are represented in great detail.

The exhibit was divided into several sections where each of the partners demonstrated devices and technology that allowed viewers to participate in the life of ancient Rome in new ways. Comparative hand-held walking tour devices, large scale viewing screens, three dimensional depictions, and interactive displays are all ways that this “data” can be experienced. Creators of Rome Reborn believe that this model is an accurate representation of what a visit to the city would have been like during the time of Constantine the Great.

Computer and Robotics Professor Takeo Kanade, who is also the Director of the Quality of Life Technology Engineering Research Center at Carnegie Mellon gave the final keynote of the conference and looked to a balanced future where robots and machines would provide humans with just enough assistance to improve their lives, but not take over. Most SIGGRAPH attendees that I spoke with were overwhelmed by the volume and variety of information that was packed into a few action-packed days, but as they say, that’s entertainment. And as Kanade concluded, this field is fun.

Posted in Topics: Education, Mathematics, Open Source, Science, Social Studies, Technology, computer animation, computer graphics

Add a Comment »

NECC: Cheryl Lemke on Learning, Innovation and, Soda Cans

Cheryl Lemke, CEO and President of Metiri Group, a consulting firm dedicated to advancing technology in schools presented to a full house of educators, technology coordinators and other educational technology professionals at this week’s National Education Computing Conference (NECC) held in San Antonio, Texas. In her session, The Ripple Effect: 21st Century Innovations That Matter, Lemke asserts that constructivist, high technology approaches to learning greatly influence students abilities toward deeper understanding of content and challenged audience members to become agents of change in our schools. Lemke pointed to John Bransford’s work through learning by design, creating authentic learning experiences within greater context as opposed to learning in discrete parts without grounding information in purpose and meaning.

Identifying real situations that require students to solve problems for an invested “client” is an ideal method of teaching in this manner. As an example, Lemke noted Telannia Norfar, a math teacher with an interesting approach to teaching demonstrated on her Logic Inc. blog. Students look at data using algebraic equations to determine which cell phone provider is the most cost effective. Through data analysis, students come up with a business proposal on how to improve the quality and cost of food in the school cafeteria. These examples show how teachers and students can identify real-life situations that provide a context for learning that is both authentic and meaningful.

Key to this approach is bearing in mind what brain-based learning teaches us about how people process information in “chunks” and the need for scaffolding information so as not to overload the learner. Scaffolding is a process of building knowledge, guiding the learner into greater understanding and context. This process also helps students become self-directed learners. Lemke used the example of a guided inquiry activity to determine whether a sweater or aluminum foil keeps a soda can cooler over time (my mother would debate the use of aluminum foil as the most effective way, but students investigating this may find otherwise!)

Lemke states that providing authentic learning experiences for students correlate with higher test scores due to the deeper contextual meaning they provide through “learning with understanding”, an idea not held by all in the educational community.

Lemke then discussed the “culture of collaboration” and the need for more collaborative learning environments to enhance students’ abilities to address concepts involving more complex thinking as the research would suggest. Innovations in other fields can teach educators ways to infuse more collaborative strategies in teaching. In the business world, for example, businesses such as Cisco and Best Buy have created online social networks of support with customers helping other customers out with technical problems and the Best Buy “blue shirt” support network.

2.0 technologies lend themselves to collaborative environments where students can create networks for sharing and distributing information, but bearing Lemke’s emphasis on constructivist approaches, educators must think of creative ways of addressing content in authentic contexts.

Posted in Topics: General

Add a Comment »

Special Libraries Association 2008

SLA June 15-18, 2008
Seattle, Washington

As usual, my activities were mainly with the Physics-Astronomy-Mathematics (PAM) division of SLA. I gave a brief report about the meeting “Graduate Education in Physics: Which Way Forward” at which I gave a poster presentation on information fluency at the PAM-wide roundtable discussion. I gave a two hour poster presentation on comPADRE with Dr. Bruce Mason, Principle Investigator for the comPADRE project. comPADRE is the physics and astronomy portion of NSDL. Bruce and I also presented together at the Physics Round Table discussion. We are proposing that science librarians can help spread the word about comPADRE (and all of NSDL) by attending regional meetings of groups such as the American Physical Society, the American Association of Physics Teachers, etc. I also attended the Astronomy Round Table discussion. There was a discussion about the usefulness of and demand for paper pre-prints now that articles are on-line. Two groups of faculty still want the paper: older faculty who have always done it that way and the newer faculty who want copies to send to family and friends. The Institute of Physics announced a change in the pricing structure for the astronomy journals that they took on in 2008 (formerly published by the U. of Chicago Press).

I attended a session “The Science of Coffee”, which was quite interesting. The speaker was Dr. Joe Vinson of the University of Scranton. His presentation will be on the SLA web site soon. His article “Take Two Cups of Coffee and Call Me Tomorrow” is on his web page: http://academic.scranton.edu/department/chemistry/

I also attended the session “Alternative Fuels: Technologies for a Healthy Planet”. Dr. Richard Nelson , Kansas State University and Alvetta Pindell, from the Information and Research Services Branch of the National Agricultural Library spoke. Their presentations will soon be on the SLA web site. An interesting web site that was mentioned is:
http://www.biodiesel.org/

Quite a bit of my time was spent with vendors. Thomson Reuters has a new “vertical search” engine that was demonstrated. It is in beta test currently. The person who demonstrated the systems said: “Using context shrinks the haystack and makes the needle bigger.” He contends that the new search engine will eliminate spam and move the most relevant sites to the top of the list.

AIP is pleased to announce that Physics Today no longer has a one year embargo and is available back to issue one. The one year embargo applied to institutional subscriptions. Faculty got very frustrated by that.

Sara Tompson, at the University of Southern California attends physics colloquia and prepares bibliographies related to the topic. The physics faculty and grad student look forward to the service. http://isd.usc.edu/~sarat/PhysColloqBibliogs.html

The winner of the International Scholarship this year is Mandy Taha, Senior Research Services Librarian, Biblioteca Alexandrina. She spoke at the PAM wide round table. The facility is quite spectacular.
http://www.bibalex.org/english/aboutus/building/facts.htm

Scitopia celebrated a one year anniversary. They have added more publishers to the systems this year. (scitopia.org)

As always, I enjoyed meeting with my colleagues and sharing ideas.

Pat Viele

Posted in Topics: General

Add a Comment »

The Petabyte Problem: Scrubbing, Curating and Publishing Big Data

Galaxy Zoo

One strategy for classifying the millions of galaxies mapped by the Sloan Digital Sky Survey was to open the Galaxy Zoo, invite the public to look at the new creatures, and give them tools to record their observations.

When Alex Szalay is not considering improved strategies for managing and sharing big data, and how that might be an effective force for advancing science, he is the lead guitarist for the jazz and progressive rock band Panta Rhei. Szalay presented the third and final keynote, “Scientific Publishing in the Era of Pedabyte Data,” at JCDL on June 19, 2008.

He opened with a look at the evolution of science: 1,000 yrs ago science was empirical; during the last few hundred years science was theoretical using models and generalizations; a computational branch emerged in the last few decades, and; today science is about data exploration.

Scientific data doubles every year which has fundamentally changed the nature of scientific computing. Today scientific computing cuts across disciplines and has become unwieldy making it more difficult to extract knowledge. He noted that 20% of the worlds servers are feeding information to big data centers–Google MSN, Yahoo, Amazon, and Ebay–so it’s not only just about scientific data.

Szalay has been personally involved in the expotential growth of astronomy data from the late 1990s to 2008 due to his role with the Sloan Digital Sky Survey (SDSS) that has been “mapping the universe” as part of the Virtual Observatory activities for the last ten years. SDSS is now complete, and is in the process of developing the final data release. The completed SDSS archive will contain over 100 terabytes and will be managed by Johns Hopkins University. Sky Survey user sessions show a constant and increasing use of the SDSS data.

Data versioning was SDSS’s biggest challenge, and he emphasized that there is a need to develop automation for more steps of the steps in curating data for final publication–collection, raw, calibrated, and derived.

Szalay believes that scientific discoveries are made at the edges and boundaries or large data sets–the places where you might not naturally be looking. The number of connections that can be made among data sets the more likely that something new will be discovered along the edges suggesting data federation is significant.

Scientific projects that generate data are often short term–3-5 years. Data is only “uploaded” at the end of a project–the data will never catch up with the published discoveries. He advocates for projects becoming more active data curators and publishers further up stream in the investigative process.

One way to do this is to consider methods for “taking the analysis to the data” to manipulate data at the database.

In any scenario he noted that finding the right data to answer a question cannot be optimal because data is fuzzy and machine resources are limited. Next generation data analysis will require a combination of statistics and computer science to create novel data structures and randomized algorithms.

Szalay suggests that Power Laws arise in social systems where people are faced with many choices such as in the analysis of enormous data sets–more choices make the distribution, or long tail, more extreme. People’s choices, made by brains are naturally designed to sort, order, and balance, affect one another and are not random events. He cited long tail distribution observations including those of Pareto who suggested that 20% of population holds 80% of wealth, and more currently those of Chris Anderson who believes that everything on the web is a Power Law.

He suggests that the there is a science project pyramid–single lab at the base, multi-campus in the center, and international consortia on top. Often a scientific discipline will recognize the need for a major “giga” initiatives such as supercomputing research that is highly collaborative and distributed. The output from these efforts at every scale contain:

–Literature

–Derived and re-combined data

–Raw data

Szalay would like to see a continous feedback loop among these three aspects where data and analysis are always updating.

To answer the question, “How can you publish data so that others might recreate your results in 100 yrs.,” he referred to Gray’s laws of Data Engineering: scientific computing revolves around data; scale-out the solution for analysis; take the analysis to the data; start with 20 queries, and; go from working to working.

One successful experiment in scaling out the solution for analysis came about because the Sloan Digital Sky Survey generated more data than scientists have time to study or classify, coupled with the fact that astronomy is attractive to the public. Astronomers asked citizens for help in classifying over a million galaxies by establishing the Galaxy Zoo.

This public science analysis solution has received enormous publicity and has allowed 100,000 citizens from all over the globe to contribute to discovery by helping to classify galaxies online while viewing beautiful images of unkown locations in the universe. For example, a German teacher found and called attention to an object that she had no experience in analyzing. Her observation turned out to be a significant discovery. The object that proved to be a Voowerp.

Szalay believes that the educational impact of this work is enormous. Data sharing and publishing would benefit from the establishment of specialized Journals for data. He emphasized that scholarly communications are no longer characterized by a paper trail, but rather by an email trail along with resources collected by the Internet Archive, wiki pages, some science blogs, collaborative workbenches, and even instant messages.

Technology plus sociology plus economics must come together to continue to work on how to preserve our ntellectual data resources. Any one discipline alone is not enough to solve the data deluge problem. Both the promise and the unpredictability of increased participation in citizen science is yet another unknown. If there are 1,000s of new discoveries each day in public science is there any way to know how this will scale or does this create a horrifying potential for even more data?

Posted in Topics: Fedora, General, Repositories, Science, Technology

Add a Comment »