Social tagging vs. indexation: Comparing an educational web portal to social bookmarking
Author(s):
Peter Böhm (presenting / submitting)
Conference:
ECER 2014
Format:
Paper

Session Information

12 SES 08, Paper Session

Paper Session

Time:
2014-09-04
09:00-10:30
Room:
B338 Sala de Aulas
Chair:
Alexander Botte

Contribution

Social tagging services allow users to freely annotate digital objects, resulting in data structures known as Folksonomies. Social bookmarking services (SBS) apply social tagging to web resources by facilitating the collaborative collection and annotation of favorite web sites. Different comparisons, quantitative as well as qualitative ones, between user-generated annotations and their professionally created counterparts have been conducted, yet mostly for library records. Such comparisons are of interest to providers of digital libraries and related products because they allow insights into how internet users of different degrees of expertise annotate resources. One of the first analyses of Folksonomies was conducted by Golder and Huberman (2006). Lu et al. (2010) e.g. show in a quantitative comparison of keywords from the Library of Congress and social tags from LibraryThing for 8,562 book records, that although only 2.2% of social tags are used as keywords, these common terms account for 50.1% of the keywords. The common terms are also used much more often as social tags than other terms (average frequency of 33.5 to 5.3). Rolla (2009) on the other hand conducted a qualitative study of 45 records from the same two sources as Lu et al. (2010) and finds that the social tags include both broader and narrower terms than the keywords, but always add at least one content-related concept not present in the keywords.

This paper presents a comparison of professionally assigned keywords for web resources from an editorial, educational web portal and their respective user-generated annotations from SBS. The study focuses on a statistical description of the two types of data sources and an analysis of their agreement. This description includes properties such as the number and length of terms on the resource as well as on the vocabulary level and the term frequency. Its aim is to indicate structural similarities or differences between the two types of annotation.

Method

About 40,000 resources from two databases of the German Education Server (GES) are compared against three SBS datasets. The GES is an edited web portal providing metadata for web resources as well as a unified browsing structure. The three SBS datasets are Delicious, Bibsonomy and Edutags. Delicious is a general-purpose SBS with a broad user base. Bibsonomy is a combined social bookmarking and social cataloging system and aimed at researchers. As the most specialized SBS of the three, Edutags collaborates with the GES and is targeted at teachers. The divergent data from the different sources (GES and SBS) is imported into a specifically designed relational database. In order to achieve a higher agreement between the semi-controlled keywords and the freely allocated social tags, the social tags are preprocessed (i.e. split, trimmed) using basic string methods and normalized using NLP methods (language identification, lemmatization). The actual analysis is then conducted using SQL queries and statistical tests.

Expected Outcomes

Preliminary results show varying degrees of URL overlap between the GES and the three SBS datasets. The highest overlap can be stated between GES and Delicious, with ca. 25% and 40% of the URLs from the two GES databases respectively also present in Delicious. As expected, the different statistical metrics vary between the datasets. For example, the average number of terms per resource (bookmark or GES record) varies between 2.5 and 7.6, with GES records showing a higher average number of terms than SBS bookmarks. This indicates a more thorough indexation by professionals than by SBS users. The average number of resources per term on the other hand, despite also showing large differences between the datasets, shows a different distribution. Here, two SBS datasets have the lowest and highest means (3.3 for Delicious and 14.0 for Bibsonomy), with the other SBS dataset and the GES databases lying in between (5.3 to 9.4). The computation of the term agreement is in preparation. It is expected to exhibit similar differences between the datasets. For example, considering the topical proximity between Edutags and the GES, a high term agreement is estimated.

References

Golder, Scott A.; Huberman, Bernardo A. (2006): Usage patterns of collaborative tagging systems. In Journal of Information Science 32 (2), pp. 198–208. DOI: 10.1177/0165551506062337. Lu, Caimei; Park, Jung-ran; Hu, Xiaohua (2010): User tags versus expert-assigned subject terms: A comparison of LibraryThing tags and Library of Congress Subject Headings. In Journal of Information Science 36 (6), pp. 763–779. DOI: 10.1177/0165551510386173. Rolla, Peter J. (2009): User tags versus subject headings. Can user-supplied data improve subject access to library collections? In Library Resources and Technical Services 53 (3), pp. 68–77.

Author Information

Peter Böhm (presenting / submitting)
German Institute for International Educational Research (DIPF)
Information Center for Education
Frankfurt am Main

Update Modus of this Database

The current conference programme can be browsed in the conference management system (conftool) and, closer to the conference, in the conference app.
This database will be updated with the conference data after ECER. 

Search the ECER Programme

  • Search for keywords and phrases in "Text Search"
  • Restrict in which part of the abstracts to search in "Where to search"
  • Search for authors and in the respective field.
  • For planning your conference attendance, please use the conference app, which will be issued some weeks before the conference and the conference agenda provided in conftool.
  • If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.