# Algorithms & Complexity Seminar, MIT : Fall 2019

Organizers: Sitan Chen (sitanc AT mit DOT edu), Quanquan C. Liu (quanquan AT mit DOT edu), Nikhil Vyas (nikhilv AT mit DOT edu)

The Algorithms & Complexity Seminar for Fall 2019 will usually (unless otherwise stated) meet on Wednesdays 4pm-5pm in 32-G575 (Theory Lab on the 5th floor of the Stata Center). The style and format of these meetings are variable. Please feel free to contact the organizers to find out more details. To receive announcements of upcoming talks, subscribe to the mailing list by either visiting the mailman page or send an empty email to compalgsem-subscribe@lists.csail.mit.edu.

## Upcoming Talks

• Wednesday, April 1, 2020: Shweta Jain (UCSC). [Time: 4-5pm, Zoom Link]
Topic. Counting cliques in real-word graphs.
Abstract. Cliques are important structures in network science that have been used in numerous applications including spam detection, graph analysis, graph modeling, community detection among others. Obtaining the counts of $k$-cliques in real-world graphs with millions of nodes and edges is a challenging problem due to combinatorial explosion. Essentially, as $k$ increases, the number of $k$-cliques goes up exponentially. Existing techniques are (typically) able to count $k$-cliques for only up to $k=5$.

In this work, we present two algorithms for obtaining $k$-clique counts that improve the state of the art, both in theory and in practice. Our first method is a randomized algorithm that gives a $(1+\epsilon)$-approximation for the number of $k$-cliques in a graph. Its running time is proportional to the size of an object called the Turán Shadow, which, for real-world graphs is found to be small. In practice, this algorithm works well for $k\leq 10$ and gives orders of magnitude improvement over existing methods. This paper won the Best Paper Award at WWW, 2017.

Our second, and somewhat surprising result, is a simple but powerful algorithm called Pivoter that gives the exact $k$-clique counts for all $k$ and runs in $O(3^{n/3})$ time in the worst case. It uses a classic technique called pivoting that drastically cuts down the search space for cliques and builds a structure called the Succinct Clique Tree from which global and local (per-vertex and per-edge) $k$-clique counts can be easily read off. In practice, the algorithm is orders of magnitude faster than even other parallel algorithms and makes clique counting feasible for a number of graphs for which clique counting was infeasible before. This paper won the Best Paper Award at WSDM, 2020.

## Past Talks

• Wednesday, September 18, 2019: Or Zamir (Tel Aviv University). [32-G575 Time: 4-5pm]
Topic. Faster k-SAT Algorithms Using Biased-PPSZ.
Abstract. The PPSZ algorithm, due to Paturi, Pudlak, Saks and Zane, is currently the fastest known algorithm for the k-SAT problem, for every $k>3$. For 3-SAT, a tiny improvement over PPSZ was obtained by Hertli. We introduce a biased version of the PPSZ algorithm using which we obtain an improvement over PPSZ for every $k\geq 3$. For $k=3$ we also improve on Herli’s result and get a much more noticeable improvement over PPSZ, though still relatively small. In particular, for Unique 3-SAT, we improve the current bound from $1.308^n$ to $1.307^n$.

Joint work with Thomas Dueholm Hansen, Haim Kaplan and Uri Zwick.
• Thursday, October 31, 2019: Kira Goldner (Columbia University). [32-D463 (Star) Time: 3-4pm]
Topic. Mechanism Design under Interdependent Valuations.
Abstract. We study buyers with interdependent valuations: where one buyer's private information about an item impacts how much another buyer is willing to pay for it. In this setting, if a buyer misreports his own private information, he can impact not only what the auctioneer believes his own value is, but also what the auctioneer believes regarding other buyers' values as well, allowing him to essentially corrupt their values. As a result, the usual mechanism design tricks fall short, and welfare maximization is notoriously difficult. Almost all prior work in this setting requires a very strong "single-crossing" condition on the valuation functions in order to obtain any positive results.

We introduce two more natural notions -- first, a relaxed, parameterized notion of single-crossing, and second, a completely unrelated notion, one of submodularity over the private information of buyers -- that each separately enable good approximation guarantees to optimal welfare. These conditions, combined with the lens of approximation, allow us to go beyond not only the restrictive single-crossing notion, but to go far beyond the single-item setting and to give positive results for even the very general setting of combinatorial auctions.

Joint work with Alon Eden, Michal Feldman, Amos Fiat, and Anna Karlin.
• Monday, November 4, 2019: Piotr Sankowski (University of Warsaw). [32-G882 Time: 1:30-2:30pm]
Topic. Walking Randomly, Massively, and Efficiently.
Abstract. We introduce an approach that allows for efficiently generating many independent random walks in big graph models, such as the Massive Parallel Computation (MPC) model. We consider the case where the space per machine is strongly sublinear in the number of vertices. In this case, many natural approaches for graph problems struggle to overcome the log(n) MPC round complexity barrier. We design a PageRank algorithm that breaks this barrier even for directed graphs. In the undirected case, we start our random walks from the stationary distribution, so we approximately know the empirical distribution of their next steps. This way we can use doubling approach to prepare continuations of sampled walks in advance. Our approach enables generating multiple random walks of length l in O(log l) rounds on MPC. Moreover, we show that under 2 Cycle conjecture this round complexity is asymptotically tight. One of the most important applications of random walks is PageRank computation. We show how to use our approach to compute approximate PageRank w.h.p. for constant damping factor:

-in $O(\log\log n)$ rounds on undirected graphs (with almost linear total space),
-in $O(\log \log^2 n)$ rounds on directed graphs (with almost linear total space).

Our algorithm for directed graphs is based on a novel idea; we define a mixture of directed and undirected versions of the graph. This allows us to start our sampling from the undirected case and then to move step by step towards the directed case. In each step, we sample slightly more walks to be able to estimate the stationary distribution for the next step.

Joint work with Jakub Lacki, Slobodan Mitrovic, and Krzysztof Onak
• Wednesday, November 13, 2019: Talya Eden (MIT). [32-G575 Time: 4-5pm]
Topic. Approximately counting and sampling edges uniformly, parameterized by the arboricity.
Abstract. I will start by describing an very simple algorithm for approximating the number of edges in a graph. Then I turn to describe the problem of sampling edges from a distribution that is point-wise almost uniform. Given query access to a graph $G$ over $n$ vertices, m edges and a bound $\alpha$ on the arboricity of $G$, both algorithms have $O((n\alpha)/m)$ expected query complexity, so they improve on the previous $O(n/\sqrt m)$ bound for both problems.

This talk is based on joint works with Dana Ron, Will Rosenbaum and C. Seshadhri.
• Thursday, November 14, 2019: Michael Kapralov (EPFL). [32-G882 Time: 4-5pm]
Topic. Dynamic Streaming Spectral Sparsification in Nearly Linear Time and Space.
Abstract. In the dynamic graph streaming model, an algorithm processes, in a single pass, a stream of edge insertions and deletions to an $n$-node graph. With limited space and runtime, the goal is to output a function or approximation of the graph at the end of the stream. A central challenge in the dynamic streaming model is developing algorithms for spectral graph sparsification, which is a powerful generic approximation method.

In this paper, we resolve the complexity of this problem up to polylogarithmic factors. Using a linear sketch we design a streaming algorithm that uses nearly linear space, processes each edge update in polylogarithmic time, and with high probability, recovers a spectral sparsifier from the sketch in nearly linear time. Prior results either achieved near optimal space, but quadratic recovery time [Kapralov et al. 14], or ran in subquadratic time, but used polynomially suboptimal space [Ahn et al 13].

Our main technical contribution is a novel method for recovering graph edges with high effective resistance from a linear sketch. We show how to do so in nearly linear time by `bucketing' vertices of the input graph into clusters using a coarse approximation to the graph's effective resistance metric. Our result can be viewed as giving the first efficient $\ell_2$-sparse recovery algorithm for graphs.
• Wednesday, November 20, 2019: Audra McMillan (Boston University). [32-G575 Time: 4-5pm]
Topic. The Structure of Optimal Private Tests for Simple Hypotheses.
Abstract. Hypothesis testing plays a central role in statistical inference, and is used in many settings where privacy concerns are paramount. In this talk we’ll address a basic question about privately testing simple hypotheses: given two distributions P and Q, and a privacy level \epsilon, how many i.i.d. samples are needed to distinguish P from Q subject to \epsilon-differential privacy, and what sort of tests have optimal sample complexity? Specifically, we'll characterize this sample complexity up to constant factors in terms of the structure of P and Q and the privacy level \epsilon, and show that this sample complexity is achieved by a certain randomized and clamped variant of the log-likelihood ratio test. This result is an analogue of the classical Neyman–Pearson lemma in the setting of private hypothesis testing. The characterization applies more generally to hypothesis tests satisfying essentially any notion of algorithmic stability, which is known to imply strong generalization bounds in adaptive data analysis, and thus our results have applications even when privacy is not a primary concern.

Joint work with Clement Canonne, Gautam Kamath, Adam Smith and Jonathan Ullman.
• Wednesday, December 4, 2019: Manuel Sabin (Berkeley). [32-G575 Time: 4-5pm]
Topic. Discriminatory and Liberatory Algorithms: Restructuring Algorithmic “Fairness".
Abstract. The nascent field of Algorithmic Fairness recognizes that algorithms have the potential to perpetuate and codify systemic discrimination and attempts the noble goal of defining notions of “Fairness” that will mitigate this. The past year, however, has seen many critiques of the field’s direction and methodologies, illustrating how the field itself is in danger or perpetuating and codifying systems of discrimination.

This talk will review Fairness and Abstraction in Sociotechnical Systems, a work that outlines five sociotechnical “traps” that Algorithmic Fairness seems to routinely fall into. I will then present ongoing work where we introduce Discriminatory & Liberatory Algorithms (DLA): a framework to restructure the terminology, methodology, and role of Algorithmic “Fairness” through a sociotechnical lens. The merit of this will be argued via its lack of susceptibility to these five “traps.”
• Thursday, December 5, 2019: Noah Stephens-Davidowitz (MIT). [32-D463 (Star) Time: 4-5pm]
Topic. Seth-Hardness of Coding Problems.
Abstract. We show that, assuming a common conjecture in complexity theory, there are "no non-trivial algorithms" for the two most important problems in coding theory: the Nearest Codeword Problem (NCP) and the Minimum Distance Problem (MDP). Specifically, for any constant $\epsilon > 0$, there is no $N^{1-\epsilon}$-time algorithm for codes with N codewords. We also show similar results for variants of these problems, such as approximate versions, NCP with preprocessing, etc. These results are inspired by earlier work showing similar results for the analogous lattice problems (joint works with Huck Bennett, Sasha Golovnev, and Divesh Aggarwal), but the proofs for coding problems are far simpler.

Based on joint work with Vinod Vaikuntanathan, FOCS 2019.
• Wednesday, February 5, 2020: Christian Sohler (University of Cologne). [32-G575 Time: 3-4pm]
Topic. Property Testing in Planar Graphs: Bipartiteness and Subgraph Freeness.
Abstract. Property testing in planar bounded degree graphs is well understood. However, the algorithms depend heavily on the assumption that the input graph has bounded degree and cannot be used for general planar graphs.

In this line of work we study testability of properties of general planar graphs. We first show that a random walk based approach can be used to obtain a bipartiteness tester with constant query complexity. Then we explain a modified random neighborhood exploration that can be used to test subgraph freeness.

(Joint work with Artur Czumaj, Morteza Monemizadeh and Krzysztof Onak)
• Tuesday, February 11, 2020: Jason Li (CMU). [32-G575 Time: 4-5pm]
Topic. The Karger-Stein Algorithm is Optimal for $k$-cut.
Abstract. In the $k$-cut problem, we are given an edge-weighted graph and want to find the least-weight set of edges whose deletion breaks the graph into $k$ connected components. Algorithms due to Karger-Stein and Thorup showed how to find such a minimum $k$-cut in time approximately $O(n^{2k-2})$. The best lower bounds come from conjectures about the solvability of the $k$-clique problem and a reduction from $k$-clique to $k$-cut, and show that solving $k$-cut is likely to require time $\Omega(n^k)$. Our recent results have given special-purpose algorithms that solve the problem in time $n^{1.98k + O(1)}$, and ones that have better performance for special classes of graphs (e.g., for small integer weights).

In this work, we resolve the problem for general graphs, by showing that for any fixed $k \geq 2$, the Karger-Stein algorithm outputs any fixed minimum $k$-cut with probability at least $\widehat{O}(n^{-k})$, where $\widehat{O}(\cdot)$ hides a $2^{O(\ln \ln n)^2}$ factor. This also gives an extremal bound of $\widehat{O}(n^k)$ on the number of minimum $k$-cuts in an $n$-vertex graph and an algorithm to compute a minimum $k$-cut in similar runtime. Both are tight up to $\widehat{O}(1)$ factors.

The first main ingredient in our result is a fine-grained analysis of how the graph shrinks---and how the average degree evolves---under the Karger-Stein process. The second ingredient is an extremal result bounding the number of cuts of size at most $(2-\delta) OPT/k$, using the Sunflower lemma.
• Wednesday, February 12, 2020: Mark Sellke (Stanford). [32-G575 Time: 4-5pm]
Topic. Chasing Convex Bodies.
Abstract. I will explain our recent understanding of the chasing convex bodies problem posed by Friedman and Linial in 1993. In this problem, a player receives a request sequence $K_1,...,K_T$ of convex sets in d dimensional space and moves online into each requested set. The player's movement cost is the length of the resulting path. Chasing convex bodies asks to find an online algorithm with cost competitive against the offline (in hindsight) optimal path. This is both an challenging metrical task system and (equivalent to) a competitive analysis view on online convex optimization.

This problem was open for $d>2$ until recently but has since been solved twice. The first solution gives a $2^{O(d)}$ competitive algorithm while the second gives a nearly optimal $\min(d,\sqrt(d \cdot log(T)))$ competitive algorithm for $T$ requests. The latter result is based on the Steiner point, which is the exact optimal solution to a related geometric problem called Lipschitz selection and dates from 1840. I will briefly outline the first solution and fully explain the second.

Partially based on joint work with Sébastien Bubeck, Bo'az Klartag, Yin Tat Lee, and Yuanzhi Li.