**Check Your Understanding #1:** How many cell barcodes were detected in this dataset? 2711 How many genes are in the scRNA-seq data? 36601 How many peak intervals are included in the scATAC-seq data? 98319 **Check Your Understanding #2:** Using the `.var` feature, what other metadata information is also captured about the peak intervals in the `atac` AnnData object? There is currently metadata columns called "gene_ids", "feature_types", "genome", and "interval". None of these are particularly interesting for us, but this question was to make sure you understand that metadata columns are part of the data structure and that we would often be storing information we calculate here. **Check Your Understanding #3:** Using the `.obs` feature, how many peaks intervals does the first cell, `AAACAGCCAAATATCC-1`, have fragments in? 5415 different interval with at least one fragment. Note again that our function were ofter designed for genes, not peaks, so might not produce ideal column names **Check Your Understanding #4:** How many cells remain now? 2450 How many cells were removed by our filtering process? 261 = 2711-2450 **Check Your Understanding #5:** Why is the second dimension of `tss` 2001? each column is a relative position, from 1000 bp upstream of the TSS to 1000 bp downstream What are typical values for the `tss_score` of barcoded cells? in the range from 3-8 How do you interpret these scores? When we pile up our fragments, there are 3-8 times as many fragments coming from accessible regions that are found overlapping the TSS then there are at the average of 1000bp upstream and downstream away from the TSS **Check Your Understanding #6:** How many of the peak intervals were call as highly_variable with our current thresholds? 2827 Do you understand why peaks in different areas of graph are being excluded (gray) from this set? The gray peak points to the far right are excluded because they are not detected at at high enough average levels. The peaks on the bottom have were excluded because compared to other peaks with similar average detection, their detection does not vary much between cell to cell. **Check Your Understanding #7:** Based on what we were told earlier on the genes MS4A1, IL7R, and KLF4: which of these scATAC-based leiden clusters would you guess is most likely to contain B cells? Cluster 5 What can we say about MS4A1 for that cluster? That chromatin regions around MS4A1 are generally much more open and detected by our experimental assay in the cells from Cluster 5 than most other clusters. **Check Your Understanding #8:** How many cells and genes were originally in the scRNA-seq data? 2711 and 36601 How many cells and genes remain after filtering? 2636 and 21256 **Check Your Understanding #9:** Based on the muon tutorial plots, what genes are typically highly expressed in B cells? 'CD79A', 'MS4A1', 'IGHM', 'IGHD', 'IL4R', 'TCL1A' in Naive B. 'CD79A', 'MS4A1', 'IGHM', and 'TCF4' in memory B cells Based on our plot, which one of our 12 clusters is most likely to contain B cells? cluster 5 **Check Your Understanding #10:** According to the previous violin plot, which other cell cluster beside cluster #5 might have some relationship in the differential expression of its genes and the accessibility of peak interval `chr2:231672391-231673225`? Cluster #11 **Check Your Understanding #11:** Compare the pile ups between the red and the green peak intervals. Do they match your intuition? Using the UCSC Genome Browser's GENCODE gene track, does the general distance of each peak from the CCR6 TSS match their distal/promoter label in the table above? It mostly matches, the peaks that are distal with large negative distances are far upstream and the peaks that have relatively small distances are close to some of the TSSs for some GENCODE annotation. If you look very closely, you will find that it does not match up exactly meaning this simple mapping to these annotations is not the ones used by 10X Genomics to create their peak annotation table **Check Your Understanding #12:** What ENCODE Candidate Cis-Regulatory Element (cCRE) is found below our peak? EH38E2524708 (chr6:167114005-167114354), also EH38E2524709 (chr6:167114605-167114767) Binding site motifs for what transcription factors are found in this cCRE? lots of sites, many related to the PAX, JUN, and USF families of transcription factors **Check Your Understanding #13:** Do the pileups of our three user uploaded tracks in the genome browser (peaks, cluster5 fragments, match background fragments) confirm this? Yes, it is more subtle to see, but for red colored peaks, there are stronger pileups with the matching background fragments, then in the cluster 5 only fragments. How does the logfold ratios in our table for this gene compare to our CCR6 example? The logfold ratios for KLF4 are mostly strongly negative (meaning the peaks around KLF4 are less accessible/detected in the cells of cluster #5 than in the rest of the cells), whereas for CCR6, the rations were were mostly positive, meaning increased accessibility for cells of cluster #5.