Home Courses For Students Research Interests Professional Service
       
 

Useful Datasets, Exercises, ... for the Cybersecurity Analytics Book and/or Security Enthusiasts

   
 
List of Chapters and Appendices of the Cybersecurity Analytics Book: Chapter 1: Introduction; Chapter 2: What is Data Analytics; Chapter 3: Security Basics and Security Analytics; Chapter 4: Statistics; Chapter 5: Data Mining - Unsupervised Learning; Chapter 6: Machine Learning - Supervised Learning; Chapter 7: Text Mining; Chapter 8: Natural Language Processing; Chapter 9: Big Data Techniques and Security; Appendix A: Linear Algebra; Appendix B: Graphs; Appendix C: Probability
(Slides, exercises) Slides for the book chapters and other security and privacy topics are here, some exercises too
A warning about Dataset Quality (I cannot personally vouch for the quality of all the datasets listed here). Check out the ACM CCS 2019 Data Quality paper here to see what can go wrong. It also has other helpful references on Security Analytics. Link to ACM Digital Library (please email me if you cannot download it)
(Malware) A repository for Malware Virustotal
(Phishing) The IWSPA-AP Version 2.0 Phishing Datasets and other phishing related datasets are available for academic research with a request to me (send: (a) the front and back of your valid institutional ID from your institutional email address, and (b) this signed NDA ). The paper to cite in any publications that use the Version 2.0 dataset is the ACM CCS 2019 paper by Verma et al. Bib file of the paper here The Version 1.0 of the dataset was used in the 1st IWSPA-AP shared task Proceedings of the shared task
(News) Fake and real news Github link
(Deception Datasets) The 5 datasets we used for our ACM CODASPY 2022 Poster on Deception can be found here deception datasets here
(Spam) A bit dated now - Spambase Dataset is available from UCI link below
(Spam) TREC also organized spam competitions TREC Spam Dataset
(Security - General) A repository for security datasets Secrepo
(Security - General) Datasets and more information repository Impact Cyber Trust
(Security - General) University of Victoria datasets - including fake news detection, stylometry authentication, cloud security, botnet and ransomware detection, and behavioral biometric Link
(General) The well-known UCI repository has other datasets besides security UCI Archive of Datasets
(Malware) A 20Million instance malware dataset released December 2020 Link Here
(Network) LANL Link Here
(IoT) UNSW datasets including PCAPs Link Here
(IoT) Network Intrusion Dataset Link Here
(Botnet/Normal/Background Traffic) The CTU-13 datasets Link Here
(Collection of Security Datasets) Awesome-Cybersecurity-Datasets Link Here
(Top 10 Datasets for Cybersecurity Projects) According to Analytics India Magazine Link Here
(Collection of Cybersecurity Datasets) From University of New Brunswick in Canada Link Here
Thanks to David Marchette and Srini Srivathsan for sharing their lists. More to come. Check back frequently for updates. Please drop me an email if you have a security dataset that you would like to advertise here, or if you know of any security dataset(s), or if some link on this page is broken