Radosław Michalski - home page    Wersja polska       English version   
Datasets
Self-prepared datasets:

Spreading processes in virtual world platform
Description: Presented data contains the record of five spreading campaigns that occurred in a virtual world platform. During these campaigns, users were distributing the avatars between each other. The processes were either incentivized or not incentivized, and varying in time and range. The campaign data is accompanied by the events that can be used to build a multilayer network in order to be able to place these campaigns in a wider context (friendships, messages, transactions, etc.).
Number of nodes: 954,722
Number of timestamped edges: 51,750,836
Citation: Jankowski, J., Michalski, R., Bródka, P.: A multilayer network dataset of interaction and influence spreading in a virtual world. Scientific data, 4, 170144 (2017)
Citation (BibTeX): jankowski2017multilayer.bib (txt)
Download: 2.2 GB download from Harvard Dataverse

Manufacturing company e-mail communication and organizational structure
Description: History of internal e-mail communication (sender, recipient, datetime) between employees of a mid-sized manufacturing company. Multiple recipients of the same e-mail (To, CC, BCC) are represented as separate rows without distinguishing the recipient type. In this version apart from the communication metadata the organizational structure of the company is published (who reports to whom). The period covered are nine full months of 2010 starting from 2010-01-01 to 2010-09-30 (event dates in local time).
Number of nodes: 167
Number of timestamped edges: 82,927
Citation: Nurek, M., Michalski, R.: Combining Machine Learning and Social Network Analysis to Reveal the Organizational Structures. Applied Sciences 2020, 10(5), 1699 (2020)
Citation (BibTeX): nurek2020combining.bib (txt)
Download: 2.5 MB download from Harvard Dataverse

Bitcoin addresses and their categories
Description: The dataset contains Bitcoin addresses that have been identified and belong to one of particular categories: mining pools, miners, coinjoin services, gambling services, exchanges, other services - 8,008 addresses in total. The assignment of labels comes from two sources: plausible assumptions and external services and is not guaranteed to be error prone. These labels have been used for training and validating the performance of machine learning algorithms for discovering the types of addresses.
Number of addresses: 8,008
Citation: Michalski, R., Dziubałtowska, D., & Macek, P. (2020): Revealing the Character of Nodes in a Blockchain with Supervised Learning. IEEE Access, Vol. 8, pp. 109639-109647 (2020)
Citation (BibTeX): michalski2020revealing.bib (txt)
Download: 0.5 MB download from Harvard Dataverse

Other datasets sources:
  • KONECT - The Koblenz Network Collection (URL)
  • SNAP - Stanford Large Network Dataset Collection (URL)
  • Alex Arenas network data sets (URL)
  • Network Repository (URL)
  • Index of Complex Networks (URL)
Contact me
Wrocław University of Science and Technology
Department of Artificial Intelligence
Radosław Michalski

Wybrzeze Wyspianskiego 27
50-370 Wroclaw
Poland

Bldg. D-21, room 231



My GPG key (about GPG)

Phone no. +48 71 320 34 53

University calendar

Kalendarz akademicki PWr


    Radosław Michalski © 2011-2021