Course Introduction

Data

Data

binary data
Individual mobile phone plan data remaining.
Cambridge Analytica (2013-2018) corporate logo
Russian group, Midnight Blizzard, accessed by Microsoft corporate as well as customer email in the first half of 2024.
National Public Data, a background check service, reports theft of ~134 million unique records potentially including name, ssn, postal and email addresses.
National Public Data, a background check service, published their own master database passwds.
Right to be forgotten form @ google.com, not for US citizens.
National Health Service cookie notice.
Social media icons, more or less.
Sequence Read Archive growth since 2009, NIH.
Protein Data Bank growth since 1976.
European Nucleotide Archive growth since 1982.
1000 Cannabis project public data announcement, Medium.com.

FASTA


>Chain A, CRYSTAL STRUCTURE OF RNASE T1 WITH 3'-GMP AND GUANOSINE: A PRODUCT
COMPLEX
ACDYTCGSNCYSSSDVSTAQAAGYKLHEDGETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVYSGGSPGADRVVFN
ENNQLAGVITHTGASGNNFVECT

Lipman DJ and Pearson WR (1985) Rapid and sensitive protein similarity searches Science 227: 1345-41.

Meme representation of secretory signal sequence from P. falciparum.

FASTQ

@This is a short sequence
		  
GCATACCGCAGTCGACTCGTA
+
!'))FGVHII[abacdbac(!

Quality scores, low to high

!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Cock PJA, Fields CJ, Goto N, Heuer ML and Rice PM (2010) The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants Nucleic Acids Res 38: 1767-71.

PDB

 
ATOM     32  N   THR A   5       8.318   1.454   8.777  1.00  9.53           N
ATOM     33  CA  THR A   5       9.128   0.477   8.010  1.00  9.06           C
ATOM     34  C   THR A   5       9.652   1.192   6.734  1.00  8.76           C
ATOM     35  O   THR A   5       8.805   1.679   5.987  1.00 10.45           O
ATOM     36  CB ATHR A   5       8.295  -0.786   7.598  0.50 10.51           C ATOM     37  CB BTHR A   5       8.338  -0.785   7.522  0.50  8.84           C ATOM     38  OG1ATHR A   5       8.191  -1.556   8.858  0.50 11.07           O ATOM     39  OG1BTHR A   5       7.390  -1.287   8.528  0.50  8.05           O ATOM     40  CG2ATHR A   5       8.800  -1.735   6.504  0.50 10.11           C ATOM     41  CG2BTHR A   5       9.366  -1.870   7.129  0.50 13.39           C

Bernstein, FC, Koetzle, TF, Williams, GJ, Meyer EE, Brice MD, Rodgers JR, Kennard, O, Shimanouchi, T and Tasumi, M (1977) The Protein Data Bank: a computer-based archival file for macromolecular structures J Mol Biol 112 53542.

Biochemical Data Growth