Term
How should passwords be stored? |
|
Definition
- Dont store as clear text - hash the password (with salt) and store the hashed value - dont use SHA1 (its weak) |
|
|
Term
|
Definition
- salt prevents identical passwords from having identical hashes |
|
|
Term
What are some tradeoffs in security design |
|
Definition
- (Services offered) versus level of security
- (Ease of use) versus security
- (Cost of security) versus cost of loss |
|
|
Term
Examples of network threats |
|
Definition
- Unauthorized access - Impersonation - Denial of service |
|
|
Term
What is Form field code injection |
|
Definition
HTML forms in which data are not validated are open to SQL injection
ALWAYS escape input to a form before passing the data to the next page or script
ALWAYS assume that any data input by a user is malicious |
|
|
Term
|
Definition
SSL handshake (RSA) without keyless SSL Public Key Encryption Operation Public Key Signature Operation Authorization based on 3rd party authentication |
|
|
Term
|
Definition
Single Origin Policy/Same Origin Policy - Your webpage should only access your own server, not other servers/webpages. |
|
|
Term
|
Definition
Cross-Origin Resource sharing - When a webpage wants access another secure page, I first ask the secure page if it trusts the page i'm on to make that request. |
|
|
Term
What is a pre-flight request? |
|
Definition
- Part of CORS - Used to check whether the server will permit a cross-origin request with specific HTTP methods or headers before the actual request is made. |
|
|
Term
|
Definition
When a website wants to accept form input, it just gets an HTTP request, and doesn't know if it corresponds to a button press. To solve this the webpage generates a random token (CSRF Token) associated with that instance of that form, hidden in the form itself. |
|
|
Term
|
Definition
- Cross-Site Scripting - Dont trust user generated input
- My response will include a Content-Security-Policy header
Before loading a resource or running a script, check if its in the whitelist. The easiest way to do XSS is to inject inline styles and scripts. This can be disabled entirely via the CSP header
Before you load a resource or run a script, you check whether it is in my whitelist The easiest way to do XSS is to inject inline styles and scripts. You can disable inline content entirely via the CSP header |
|
|
Term
|
Definition
Content security Policy - Included with response - before loading something, check if its in CSP whitelist |
|
|
Term
What is SSL Hijacking in a Man-in-the-Middle (MITM) attack? |
|
Definition
- SSL Hijacking occurs when an attacker intercepts and manipulates the communication between a client and a server during the SSL/TLS handshake process. |
|
|
Term
|
Definition
|
|
Term
|
Definition
Distributed Denial of Service (DDoS) Attack
A DDoS attack uses a large number of “bots” on infected computers to make more requests than a server can handle, rendering it incapable of responding to legitimate requests |
|
|
Term
Ways of limiting DDoS attacks? |
|
Definition
- Rate Limiting
- Traffic Filtering |
|
|
Term
What is Rate limiting in DDoS mitigation? |
|
Definition
One form of mitigation is only responding to a certain number of requests per second from a given IP |
|
|
Term
What is Traffic limiting in DDoS mitigation? |
|
Definition
DDoS attacks can be mitigated by maintaining a list of IP addresses known to be infected
- Use a reverse proxy server - CAPTCHA |
|
|
Term
What is a Reverse Proxy Server? How does it help with DDoS mitigation? |
|
Definition
Solve DDoS by caching and load balancing (Use Cloudflare) |
|
|
Term
What are race conditions? |
|
Definition
When two or more threads/processes attempt to read or write to a shared resource at the same time, the outcome may vary based on the scheduling and order of execution.
race conditions arise from non-deterministic intertwining of operations and a lack of proper synchronization mechanisms.
Race conditions can lead to wrong or inconsistent results, like data curruptions, unexpected behavior, crashed, or other system failures. - Hard to test because the behaviors are dependent on the specific timing and interconnection of operations. |
|
|
Term
How do you detect race conditions? |
|
Definition
In order to detect race conditions, it is necessary to develop use robust test cases that cover many scenerios and concurrent execution paths. Random inputs, and stress testing can increase the likelihood of finding a race condition. If weird behavior arises, data corruption occurs, or crashed happen, there might be a RC.
Preforming a thorough review of the code, keeping in mind where shared resources are, like shared variables or data structures. Also look for potential scenerios where a lack of synchronization or bad synchronization might lead to a race condition.
static code analysis tools are made to detect potential race conditions. they look for potential concurrency issues, improper synchronization, or shared resource access violations.
- Debugging and Logging
Concurrency Analysis Tools detect race conditions in concurrent programs. they use dynamic analysis, runtime monitoring, and state-space exploration, find potential race conditions.
Profiling and Performance Analysis: look for unexpected behavior that may indicate race conditions. Analyze resource utilization, synchronization patterns, and thread/process interactions to identify potential concurrency issues. |
|
|
Term
How do you debug race conditions? |
|
Definition
Reproduce the Issue: Try to create a reliable way to reproduce the race condition. Identify the specific inputs or conditions that trigger the issue. This can involve manipulating timing, data, or other factors to increase the likelihood of the race condition occurring. Analyze Error Symptoms: Carefully analyze the symptoms and observe the behavior when the race condition occurs. Look for unexpected output, data corruption, crashes, or any other abnormal behavior. Collect as much information as possible about the observed symptoms to aid in the debugging process.
Logging and Instrumentation: Add logging statements or instrumentation to your code to trace the execution flow and capture relevant information during the race condition occurrence. Log thread IDs, shared resource access, timestamps, or other relevant data to aid in understanding the sequence of events leading to the issue. Stress Testing: Introduce stress testing by increasing the workload, concurrency, or load on the system. This can help uncover race conditions that occur under high load or specific timing conditions. Stress testing can also expose race conditions that might be more challenging to reproduce in controlled environments.
Binary Search Method: If you suspect a specific section of code is responsible for the race condition, you can use a binary search method to isolate the problematic area. Temporarily disable or comment out parts of the code to narrow down the location of the race condition and identify the root cause step-by-step. |
|
|
Term
What are three types of design patters? |
|
Definition
Creational Patterns: patterns regarding making objects without introducing additional complexity
Structural Patterns: patterns regarding how to organize classes and objects to form larger structures and provide new functionality. Keep structures flexible and efficient
Behavioral Patterns: patterns of communication between objects. |
|
|
Term
What does a project manager do? |
|
Definition
Project Manager: Make sure projects run smoothly, handle scheduling, resource allocation, and scope to meet project goals.
• Product Manager: Own "customer"-facing "product," prioritize features, gather requirements, and define vision and strategy for the product.
• Program Manager: Oversees a portfolio of projects, ensures alignment with business goals, manages dependencies across projects. |
|
|
Term
|
Definition
- For plan-driven development project
project plan examines available resources and make a schedule for doing everything. |
|
|
Term
What are the sections of a project plan? |
|
Definition
Plan sections: - Introduction - Project organization - Risk analysis - Hardware and software resource requirements - Work breakdown - Project schedule - Monitoring and reporting mechanisms |
|
|
Term
|
Definition
A Gantt chart is a visual project management tool that represents a timeline of tasks or activities. |
|
|
Term
What are OKRs and how do they work? |
|
Definition
OKRs (Objectives and Key Results) are a goal-setting framework designed to define and track objectives and their measurable outcomes.
SMART Goals: Specific, Measurable, Achievable, Relevant, Time-Bound.
Quarterly Process: 1. Review the previous quarter’s Key Results (achievements and lessons learned). 2. Define and lock in next quarter’s Objectives and Key Results (cannot be changed).
:::EXAMPLE:::
Objective: Attract more users to new visual similarity recommendations. Key Results: • Increase in Customer Lifetime Value (CLV). • A/B test results (Clicks, Add to Cart, Conversions). • Year-over-Year (YOY) improvements in these metrics. • Offline model evaluation and deployment of new model version.
Category: Goal Setting / Performance Management |
|
|
Term
|
Definition
(Whos) Responsible / (Whos) Accountable / (Whos) Consulted / (Whos) Informed |
|
|
Term
|
Definition
A graph (impact x effort) that is color coded to show what is hard and easy. In other words what to do later and what to do next. |
|
|
Term
What are types of maintenance? |
|
Definition
Corrective Maintenance: - Fixing defects and errors discovered in the software.
Adaptive Maintenance: - Modifying the software to accommodate changes in the environment or external factors.
Perfective Maintenance: - Improving the software's quality, performance, or maintainability.
Preventive Maintenance: - Proactively addressing potential issues and preventing future problems. |
|
|
Term
list some tools for project management? |
|
Definition
- RACI - OKRs - Gantt - project plan - priority grid |
|
|
Term
what kinds of data collection tools do we have for managing a projects performance? |
|
Definition
- Profiling: involves collecting data about the execution of the program using to identify performance bottlenecks, resource usage, and other metrics that can help in optimizing the application.
- Benchmarking: is a process of evaluating and comparing the performance, efficiency, or other metrics of a software system, hardware component, or algorithm against a set of predefined standards.
- Monitoring: refers to the continuous observation and measurement of various aspects of the system's performance, behavior, and health in real-time. |
|
|
Term
|
Definition
Elasticsearch: - Distributed - Real-time - Search and analytics engine - stores, searches, and analyzes large amounts of data quickly.
Logstash: - data processing pipeline - ingests, transforms, and enriches log data from various sources. - log files, message queues, databases
Kibana: - web-based data visualization and exploration tool - works with elastisearch - user friendly |
|
|
Term
|
Definition
- The probability of failure-free system operation over a specified time in a given environment for a given purpose |
|
|
Term
|
Definition
- The probability that a system, at a point in time, will be operational and able to deliver the requested services |
|
|
Term
What are the differences between human errors, system faults, system errors, and system failures? |
|
Definition
- Human error or mistake - Human behavior that results in the introduction of faults into a system
- System fault - A characteristic of a software system that can lead to a system error
- System error 0 An erroneous system state that can lead to system behavior that is unexpected by system users
- System failure - An event that occurs at some point in time when the system does not deliver a service as expected by its users |
|
|
Term
Name the methods used for fault management |
|
Definition
- Fault avoidance - The system is developed in such a way that human error is avoided and thus system faults are minimized
- Fault Detection - Verification and validation techniques are used to increase probability of detecting and correcting errors before the system goes into service are used.
- Fault tolerance - the system is designed so that faults in the delivered software do not result in system failure |
|
|
Term
How do you prove that a formal model is true for all valid inputs to the system? |
|
Definition
- you can use an automated theorem prover (ATP) or verification tool to test all possible inputs to the system |
|
|
Term
What is the CI(continuous integration)/CD(continuous development) pipeline? |
|
Definition
Plan —> Code —> Build —> Test —> Release —> Deploy —> Operate —> Monitor —> Plan |
|
|
Term
What is the difference between CI and CD? |
|
Definition
- CI - automatically builds, tests, and integrates changes within a shared repository THEN —> - CD - automatically deploys code changes to customers directly |
|
|
Term
What are DAG runners and why use them? |
|
Definition
- DAG runners orchestrate the execution of tasks (which are organized in a DAG) based on their dependencies to ensure proper execution.
- They enable automation, scheduling, parallelism, monitoring, and error handling |
|
|
Term
What is docker compose? What are its key features? |
|
Definition
- Docker Compose - tool for defining and managing multi-container applications.
Key features: - Service definitions: Docker compose uses a compose file to define the services that make up your application. Each service represents a containerized component of the application stack like a web server, database, or worker.
- Orchestration: Docker compose orchestrates the creation, configuration, and management of multiple containers defined in the compose file. Ensures proper coordination and connectivity between the containers.
- Easy configuration and deployment: You can define your application/s configuration and dependencies in a single file, making it easy to deploy your appication consistently across different environments.
- Network and volume management: simplifies the management of networks and volumes required by your application. Lets you make custom networks for container communication and manage shared volumes for data persistence. |
|
|
Term
Describe the relationship between threads, cores, and processes |
|
Definition
- A thread = smallest unit of execution. Consists of a thread ID, program counter, register set, and a stack.
- Core = independent processing unit within a CPU that can execute instructions, read and write to memory, and perform I/O operations. Each core can only run one thread at a time.
- Process = an instance of a computer program that is being executed. Contains the program code & current activity.
- Each process runs in its own memory space & requires a context switch to communicate with other processes. May contain multiple threads and each process within a thread. Threads within the same process have shared state
Extra fun fact: Process can run 1 thread/core simultaneously (a computer can do 1 thing per thread), so to do multiple things at once they just switch really really fast between them |
|
|
Term
What is the difference between I/O bound processes and CPU bound processes? |
|
Definition
- I/O bound processes - limited by the rate at which data is transferred btwn system and external devices like hard drives, networks, or user input
- CPU bound process - limited by the rate at which the processor can compute |
|
|
Term
What is the difference between concurrency and parallelism? |
|
Definition
- Concurrency = utilize asynchronous programing or multi-processing to avoid blocking operations, allowing the CPU to perform other tasks during I/O operations.
- Parallelism = use of multi-threading or multi-processing, which allows simultaneous data processing. |
|
|
Term
What are the methods used to achieve concurrency? |
|
Definition
- Asynchronous programming - program starts an I/O operation, then yields execution. When the I/O operation is complete, execution can be resumed. Allows single threads to handle many concurrent I/O bound tasks.
- Event-driven programming - Driven by events like user actions or messages from other programs. Allows the system to react to I/O events as they occur instead of constantly polling for I/O status or waiting on I/O operations. - Non-blocking I/O and callbacks - Involves starting an I/O operation then doing other work. When the I/O op is done, a callback function is called to handle the rest. Ensures that the CPU isn’t waiting for I/O operations to complete and can continue doing other work. - Cooperative multitasking/coroutines: Coroutines are subroutines that allow multiple entry points for suspending and resuming executions at certain locations, enabling cooperative multitasking. - Actor Model - Treats “actors” as the “universal primitives of concurrent computation (whatever that means). In response to a message that an actor receives, it can make local decisions, create more actors, send more messages, and determine how to respond to the next message received. - Data Parallelism - involves distributing subsets of the same data across different cores or threads, and computing on them in parallel. |
|
|
Term
Name some major problems with parallelism |
|
Definition
- Deadlock: Everybody is waiting on everybody else - Starvation: Someone is waiting for something and never gets it (Not really a problem. The OS handles it) |
|
|
Term
What makes a good split point? |
|
Definition
Cohesion: - good split points result in cohesive components with clear purpose - minimize dependencies on external models
Loose Coupling: - minimized dependencies between components for loose coupling - encapsulate interactions and dependencies - let components change independently
Testability: - find split points that allow testing in isolation - should be able to test on an airplane. |
|
|
Term
What is Client-Server Architecture? |
|
Definition
request vs response
Client: - component - makes requests
- clients actively initiate transactions
Server: - component - fulfills requests
- servers react to client requests |
|
|
Term
|
Definition
A multitier (N-tier) architecture is an expansion of the 3-tier architecture
extra tiers do: - Replication of the function of a tier - Specialization of function within a tier - Portal services, like handling incoming web traffic |
|
|
Term
|
Definition
A merge --squash creates a new commit that contains all the changes from one branch (and the full history of the other branch) |
|
|
Term
What is a process framework? |
|
Definition
- A set of guidelines, work products, and tools that attempt to facilitate a process |
|
|
Term
What are the steps of the SDLC (Software Development Life Cycle)? |
|
Definition
- Define, design, develop, deliver, DMAINTAIN |
|
|
Term
What is the difference between a prescriptive and agile process? |
|
Definition
- Prescriptive processes - all of the process activities are planned in advance. Progress is measured against this plan.
- Agile processes - planning is incremental & it’s easier to change the process to meet changing requirements |
|
|
Term
Describe the waterfall model |
|
Definition
- The Waterfall Model:
Reqs definition —> system & software design —> implementation & unit testing —> integration & system testing —> operation & maintenance
- CONS - complete upfront specifications, over-engineering, late integration & test, reliable upfront estimates & schedules, limited value for software which tends to change pretty fast |
|
|
Term
Describe incremental development and name its pros/cons |
|
Definition
- Incremental Development - Outline Description —> Specification(—> initial version) /Development (—> intermediate versions) /Validation(—> final version)
PROS - much more agile than waterfall. Easier to get customer feedback. CONS - as development continues system structure gets weaker. Hard to tell where the process is at in development. |
|
|
Term
Describe the XP release cycle |
|
Definition
- Select user stories for this release —> break down stories to tasks —> plan release —> develop/integrate/test software —> release software —> evaluate system —> repeat
XP Heirarchy: Theme/Initiative Epic User Story Task Subtask/Ticket |
|
|
Term
What is incremental delivery? |
|
Definition
- Incremental Delivery - Deploy an increment for use by end-users —> more realistic evaluation about practical use of software. Difficult to implement for replacement systems (increments have less functionality than the system being replaced)
Define outline requirements —> assign requirements to increments —> design system architecture —> develop system increment —> validate increment —> integrate increment —> validate system —> deploy increment —> (if system incomplete) develop system increment |
|
|
Term
What is the difference between containerization and virtualization? |
|
Definition
Containerization vs Virtualization: - Virtual machine = fake hardware | Strong isolation, configurable resource utilization, dedicated hardware resources, live migration - Containers = fake OS | Lightweight, faster startup & scaling, efficient resource utilization, isolation w/o overhead |
|
|
Term
What are goals vs requirements vs non goals |
|
Definition
- Goals - what problems is it supposed to solve? - Requirements - Non-functional, functional, domain - what is it supposed to achieve?, what is it supposed to actually do?, what does it have to do? (Eg. Compliance with something) - Non-Goals - What is out of scope for this project? |
|
|
Term
What's the difference between user requirements, system requirements, and use case? |
|
Definition
- User Requirements - Statements in natural language + diagrams of the services the system provides & its operational constraints. Written for customers - System Requirements - A structured document setting out detailed descriptions of the systems functions, services, and operational constraints. Basically a contract btwn client & contractor - Use Case - Describes how a system will behave in broad areas. (Eg. The use case for acc management includes the user changing a password.) |
|
|
Term
Requirements validation vs Requirements verification? |
|
Definition
- Requirements Validation - Am I building the right product?
- Requirements Verification - Am I building the product right? |
|
|
Term
|
Definition
Use Case vs User Story
Use case = formal description. Contains a lot of information
User Story = less formal. As a ___ I want to ___ in order to
- Can be conflicting
- User stories have a benefit. Features are NOT benefits. |
|
|
Term
Software architecture vs software programming? |
|
Definition
Software Architecture - Interactions among parts, structural properties, system-level performance, outside module boundary
Software Programming - Implementations of parts, computational properties, algorithmic performance, inside module boundary |
|
|
Term
What is parnas partitioning? How are partitions decided? |
|
Definition
Parnas Partitioning - Principle used to modularize software systems by dividing them into smaller, manageable, and loosely coupled components. - Modularity - Breaking down the system into discrete models tat can be developed, tested, and maintained independently. - Information Hiding - Each module hides its internal workings from the other modules, exposing only necessary interfaces. - High Cohesion, Low Coupling - Each module has high internal cohesion, low coupling w/ other modules
How to Partition: 1. Identify Sys Reqs - Determine critical functionalities & their dependencies. 2. Decompose the system - Break down the system into major functional areas. Use top down, find potential modules 3. Define Module Boundaries - Establish clear boundaries for each module. Each module must have a distinct function 4. Ensure High Cohesion & Low Coupling - Related functionalities are grouped within modules, low inter-module dependence |
|
|
Term
What is replication / specialization / load balancing? What are they for? |
|
Definition
Replication describes having multiple running instances of a tier. This enhances reliability and availability (by adding redundancy).
Specialization describes making each tier responsible for a single function. This increases modularity
Load balancing is a tier that routes traffic to multiple copies of a tier. This (increases availability by) ensures that (each server is busy and) no server is overwhelmed |
|
|
Term
What do unit tests test for? |
|
Definition
- Valid inputs and outputs, as well as error handling |
|
|
Term
Describe what mocks, stubs, and reflectors are. |
|
Definition
- Reflectors are basically just stubs. Stubs both allow us to test incomplete systems by hardcoding answers - Mocks let you fake calls to outside systems like API calls so you can test completely internally |
|
|
Term
What is the difference between unit, integration, performance, smoke, and regression tests? |
|
Definition
- Unit: does it work in a vacuum? - Integration: does it work in its env? - Performance: is it fast/good enough? - Smoke: is it safe to deploy? - Regression: is it still working/safe? |
|
|
Term
|
Definition
- Representational state transfer. It is a set of principles and guidelines for building web services. |
|
|
Term
What are the HTTP methods for APIs? |
|
Definition
- GET: to retrieve data from the server - POST: to submit data to the server to create a new resource, such as when creating a new record in a database - PUT: to update an existing resource on the server - DELETE: to delete a resource or collection of resources on the server |
|
|
Term
What is the difference between synch and asynch blocking calls? |
|
Definition
- synch: Wait until you get a result - asynch: displays something in the wait time because the webpage has to render |
|
|
Term
What is a callback function? |
|
Definition
- A function that is passed as an argument to another function and is executed by that function once a specific event or condition occurs
- used by asynch programming so that a function can call another function and keep executing, so it can handle events and responses that may not be available immediately. |
|
|
Term
What are the components of CRUD? |
|
Definition
Data operations usually implement CRUD:
Create Read Update Delete
these operations are atomic |
|
|
Term
What is atomicity? How is it achieved? |
|
Definition
- Atomicity: A guarantee that once a process has started to update data in a table, it will complete before another process starts to update the same data
- Atomicity is achieved by locking, where when a thread is being worked on, other threads are prevented from changing the data mid-stream. |
|
|
Term
What does ACID stand for? |
|
Definition
- A: Atomicity - all-or-nothing transactions
- C: Consistency - any transaction will result in the database being in a valid state
- I: Isolation - if transactions are done concurrently, the result is the same state that would have been reached had the transactions been done serially
- D: Durability - when a transaction has been committed to the database, it will remain in the dataset until it is updated by another transaction, even if the power is |
|
|
Term
|
Definition
NoSQL means anything that isn't a RDB (relational database).
Non-relational databases store all of the data necessary for a record into a single object
This can lead to a lot of duplicate data, but that's not always an issue |
|
|
Term
What's the difference between a data lake and a data warehouse? |
|
Definition
- Data lake: contains a lot (a lot a lot) of unstructured, barely-processed, raw data. Used for stuff like ML, AI, Streaming analytics etc.
- Data warehouse: smaller, structured, refined. (there was a question on warehouses on midterm 2) |
|
|
Term
|
Definition
- Stands for Exact, Transform, Load - A process used in data warehousing to collect, process, and move data from multiple sources into a single, unified destination (eg. data warehouse, database, analytical system)
There are 2 ways to do ETL, but you should do both apparently:
1. Batch processing - long-running & scheduled 2. Stream processing - shorter & event-driven |
|
|
Term
|
Definition
You can only choose 2 of the following:
1. (C)onsistency - get up to date data if a request goes through 2. (A)vailability - get data when requesting with certainty 3. (P)artition tolerance - the system remains available during a partition |
|
|
Term
What are some ways consistency is achieved? (in the context of CAP theorem) |
|
Definition
- Conflict Resolution: resolution mechanisms like last-writer-wins (LWW), first-writer-wins (FWW), or custom conflict resolution policies can be used to determine which update should take precedence and ensure consistency
- Distributed Consensus: Provide a way for multiple nodes in a distributed system to agree on a consistent order of operation & ensures that all nodes reach agreement on the order of updates |
|
|
Term
When to use server-sided vs client sided caching? |
|
Definition
Server side - sacrifices speed for correctness Client side - sacrifice correctness for speed OR I have v strong assumptions abt how often data updates |
|
|
Term
What is reverse-proxy caching? |
|
Definition
- cache responses (like HTTP responses) on the disk rather than the data itself - this removes the load from backend servers altogether - if a server goes down, you can still process requests until the cache goes stale |
|
|
Term
Scaling up vs scaling out? |
|
Definition
Scaling up = buy a better box (upgrade) Scaling out = buy more boxes (upscale) |
|
|
Term
|
Definition
- Each server is responsible for some subset of the data. - PROS: Eases load on each server. Redundancy/fault tolerance - CONS: Complexity. CAP theorem. |
|
|
Term
What is cache ejection? What types of cache ejection are there? GOD I WANNA GO HOME |
|
Definition
- When your cache is full and you get more data, you need to eject some old data in the cache to make room.
- LRU (least recently used): Use if there are NO trends in what resources are requested, but it's a good bet that if someone just asked for it, they'll ask for it again. Eject the least recently used entry - LFU (least frequently used): Use if there are trends in what resources are requested |
|
|