Telegram-канал techleadbits - TechLead Bits: Unsorted - каталог телеграмм

techleadbits | Unsorted

Subscribe to a channel

Telegram-канал techleadbits - TechLead Bits

388

Explore articles, books, news, videos, and insights on software architecture, people management, and leadership. Author: @nelia_loginova

Subscribe to a channel

TechLead Bits

10 Jan 2025 04:18

Word Embeddings

As we discussed embeddings can be different depending on the task, but one common task is predicting the context of a word. This is the foundation of how large language models (LLMs) work. LLMs don’t actually understand human language; instead, they understand the numerical relationships between words.

Key properties of a good word embedding:
✔️ Similar words should have similar vector scores.
✔️ Different words should have vector scores with values that are far away from each other.

When each word or data point has a single embedding vector, this is called a static embedding. Static embeddings, once trained, contain some semantic information, especially in how words relate to each other. It means that word similarity depends on the data the model was trained on.

To address the limitations of static embeddings, contextual embeddings were developed. Contextual embeddings allow a word to have multiple representations based on its surrounding context. For example, the word orange can mean a fruit or a color, so it would have a different embedding for each meaning depending on the context.

Tensorflow offers a great tool called Projector where you can play with `word2vec` static embedding model and see how trained data impacts words relationships.

Embeddings can be used for words, sentences and even documents. Interesting feature that is built on top of that - context search. Instead of searching by a particular word, you can search by the meaning of that word and find a related document even if there are no exact matches.

#aibasics

Читать полностью…

TechLead Bits

05 Jan 2025 05:49

Why Empathy Matters

Conflicts typically happen when our expectations don’t match with the actions or expectations of others. Here’s the key question: if the expectations are ours, why do we blame someone else for not meeting them? All people are different with different values, priorities, principles, background and life situations. And that’s fine.

Most conflicts might be avoided if we initially discuss shared rules that both sides agreed to follow.

But what should we do if we already missed that part and have a conflict situation?

Here are some tips to follow:
✏️ Focus on understanding, not blaming: Explain your position and try to understand the other person’s point of view. Operate with facts, not emotions and personalities. Look for win-win areas where both sides can come to an agreement and benefit from it.
✏️ Listen and show empathy: Carefully listen to what the other person is saying. Try to understand their motivation, accept their feelings and fears. Remember, that people always act in the best possible way for them. Show that you’re on their side, not against them. Often, people don’t hear each other’s arguments because there’s not enough trust or interest in each other’s opinions.
✏️ Work together to find a solution: When you have an understanding of both sides of the conflict, think about a solution that satisfies the requirements and interests of both parties. Start working together toward selected direction. Collaboration helps to establish trust and achieve better results.

Empathy is a powerful tool. It's more efficient for the business to reach an agreement rather than proving someone is smarter. Remember, conflicts can be very expensive.

#softskills #communications

Читать полностью…

TechLead Bits

30 Dec 2024 05:39

UI representation of 3-dimensional meal classification from Google ML Crash Course

#aibasics

Читать полностью…

TechLead Bits

26 Dec 2024 16:15

Teams with high quality documentation are 2.4 times more likely to improve software delivery performance and meet reliability goals.

We all understand the importance of good documentation. But how often do you feel frustrated when trying to understand how a feature works? trying to find a solution for your problem? checking the source code instead of docs? Probably often.

Let's start with the difference between good and bad documentation. According to DORA, high-quality documentation should:
- Help users achieve their goals.
- Be accurate, up-to-date, and complete.
- Be easy to find, well-organized, and clear.

Tips to improve documentation:
✏️ Document Critical Use Cases. Clear use cases help users understand how to use your systems effectively.
✏️ Create Documentation Guidelines. When team members know when and how to make updates or remove inaccurate information, the team can maintain documentation quality over time. Check out Google’s Documentation Guideline.
✏️ Automate Guidelines Verification. Automate formatting and style checks with tools like Prettier.
✏️ Assign Ownership. Define clear responsibility and ownership for the documentation.
✏️ Define the Audience. Understand who will read your documentation, their background, and their goals.
✏️ Integrate into Development Process. Documentation should be part of your development process and stored close to the code. Documentation that is separate from development becomes dead right after publishing.
✏️ Automate testing for code samples or incomplete documentation.
✏️ Train your team. Teach your team how to write good documentation and explain why it’s important. I strongly recommend to check Google’s technical writing course for developers.
✏️ Recognize Documentation Work. Recognize and reward documentation efforts during performance reviews and promotions. Writing and maintaining documentation is a core part of software engineering work, and treating it as such improves its quality.

Writing good documentation isn’t easy. It requires clear processes, training, automation, and the right team culture. But investing in documentation improves your team’s development speed and overall delivery performance.

#documentation

Читать полностью…

TechLead Bits

19 Dec 2024 04:21

Write Good Error Messages

"Cannot create entity X", "Connection to service Y failed", "Cannot read file Z".
These are typical error messages seen in many systems. And they are extremely bad.

So what's wrong with these messages? They are not actionable, they don't have any details what exactly went wrong (only for connection issue I can easily generate up to 10 reasons), they don't explain users or support engineers what to do next.

Good error message should:
- Be actionable
- Be detailed and clear
- Deliver the best user experience
- Enable users to help themselves
- Reduce support workload
- Speed up issue resolution

Google has a special chapter in their technical writing course about how to write good error messages. So let's check their recommendations:
✏️ Don't fail silently. Failing to report errors is unacceptable. Assume that humans will make mistakes using your software. Try to minimize ways for people to misuse your software, but assume that you can't completely eliminate that.
✏️ Have a common style guide. Examples: Google API Error Handling, Go Error Handling
✏️ Do not swallow the root cause. Generic messages like "Server error" don’t help users understand or fix the issue.
✏️ Fail fast. Report errors as soon as they occur. Raising them later significantly increases debugging costs.
✏️ Identify the cause. Clearly explain what went wrong. Help users understand requirements and constraints. Be specific. Don't assume that users know the limitations of your system.
✏️ Explain how to fix the problem. Create actionable error messages. After explaining the cause of the problem, explain how to resolve it.

Example:
❌ Invalid input.
✅ Enter the pathname of a Windows executable file. An executable file ordinarily ends with the .exe suffix. For example: C:\Program Files\Custom\AppName.exe

Take time to train your team to write good error messages, it improves user experience, reduces support costs and speed up problem resolution.

#engineering #documentation

Читать полностью…

TechLead Bits

12 Dec 2024 16:53

DORA's delivery metrics & organizations performance level.

#engineering #devops #delivery

Читать полностью…

TechLead Bits

11 Dec 2024 05:56

Really nice demonstration how threshold affects prediction results (from Google ML Crash Course).

#aibasics

Читать полностью…

TechLead Bits

07 Dec 2024 17:01

The Mood Map

Let's check what is interesting in that tool and how it can be used in our daily life and work.

✏️ Emotions Mapping. That's ability to recognize emotions in yourself, your colleagues, and your partners. One of the helpful techniques is mirroring — matching the other person’s speech rate and tone of voice. On the neurophysiology level it means “This person is like me!”, that makes communication more pleasant, gives the sense of safety and increases the chances to reach agreements.

✏️ Task Selection. The Mood Map helps to choose tasks for yourself or your team based on current emotional states. For example, anxiety can sharpen focus, happiness and joy are good for creativity, contentment improves chances to come to a consensus. The key idea is to either pick tasks that match your current state or shift your state to suit the task. This applies to the teams too: "If you have a brainstorming session and the team seems anxious, that’s not a good match. As a leader, you either have to change the tone of the room or change the agenda to match the tone".

✏️ Understanding. What makes you happy might not make someone else happy That's important to understand and learn what motivates and inspires your team members. At the same time, emotions have universal reasons. If you understand the root of someone’s behavior, you can address it effectively.

✏️ Changing Emotions. Agreements are hard to reach if you and the other person are in different quadrants of the Mood Map. Ideally, everyone needs to move to the `green` quadrant to reach a consensus. However, jumping directly from `red` to green is almost impossible. Instead, you can guide someone through smaller transitions, like red -> blue -> green. For example, if someone is in the red quadrant, speaking slowly and calmly can help reduce emotional intensity and shift them toward blue.

I used to be skeptical about emotional intelligence techniques, but that tool looks to be helpful and practical.

Additional trick there: during complex conversations, if emotions are escalating, pause and ask yourself, “What am I feeling right now? Why?”. Reflection helps to shift your brain from the emotional side to the logical side. Once you’re back in a logical state, you can better manage the situation and improve your chances of success.

References:
- davidcaruso4617/videos">David Caruso Youtube Channel
- Can emotional intelligence be learned?
- Emotional Intelligence in a Changing World

#softskills #leadership #communications #productivity

Читать полностью…

TechLead Bits

02 Dec 2024 17:26

Visualized summary for main stages from the talk Minimum Viable Architecture.

#architecture

Читать полностью…

TechLead Bits

28 Nov 2024 17:18

Visualization of how different loss functions can change model training results. As mentioned above MSE moves the model more toward the outliers, while MAE doesn't.

#aibasics

Читать полностью…

TechLead Bits

28 Nov 2024 17:03

Examples of a model that converges vs one that doesn't.

#aibasics

Читать полностью…

TechLead Bits

25 Nov 2024 18:06

ML Introduction

Let's start AI basics with the ML definition and their types.

Definition from Google ML Introduction course:

ML is the process of training a piece of software, called a model, to make useful predictions or generate content from data.

ML Types:
📍 Supervised Learning. The model is trained on lots of data with existing correct answers. It's "supervised" in the sense that a human gives the ML system data with the known correct results. This type is used for regressions and classifications.
📍 Unsupervised Learning. The model makes predictions using data that does not contain any correct answers. A commonly used unsupervised learning model employs a technique called clustering. The difference from classification is that categories are discovered during training and not defined by a human.
📍Reinforcement Learning. The model make predictions by getting rewards or penalties based the on actions performed. The goal is to find the best strategy to get the most rewards. Approach is used to train robots to execute different tasks.
📍Generative AI. The model creates content (text, images, music, etc.) from a user input. These models learn existing patterns in data with the goal to produce new but similar data.

Each ML type has its own purpose, like making predictions, finding patterns, creating content, or automating routine tasks. Among them, Generative AI is the most popular and well-known today.

#aibasics

Читать полностью…

TechLead Bits

22 Nov 2024 04:16

Cloud Ecosystem Trends

This week CNCF published Emerging trends in the cloud native ecosystem with a list of trends that will continue to grow in 2025.

Top trends:
🚀 Cloud Cost Optimizations. With growing cloud adoption, businesses focus on controlling cloud costs using tools like Karpenter and OpenCost. The same trend was also highlighted by FinOps Foundation earlier this year.
🚀 Platform Engineering (I did overview there). Extend developer experience with platforms for observability, policies as a code, internal developer portals, security, CI/CD, and storage to speed up business development.
🚀 AI Synergy. The trend is to support AI training and operations in the cloud. New actively developed projects there:
- OPEA: a collection of cloud-native patterns for GenAI workloads
- Milvus: a high-performance vector database
- Kubeflow: a project to deploy machine-learning workflows on Kubernetes
- KServe: a toolset for serving predictive and generative machine-learning models
🚀 Observability Standards Unification. Projects like OpenTelemetry and the Observability TAG unify standards, minimize vendor locks, and reduce costs.
🚀 Security. Security is a top priority topic in CNCF. There are some newly graduated projects in that area (like Falco) and separate TAG-Security group that publishes white papers that offer directions to the industry on the topic of security.
🚀 Sustainability (more about GreenOps there). Sustainability tools (like Kepler, OpenCost) measure carbon footprints of Kubernetes applications. The area is under active development now, but it already has promising open-source projects and standards.

It's interesting that overprovisioning and high resource waste is still the main problem in modern clouds. According to the Kubernetes Cost Benchmark Report clusters with 50 or more CPUs used only 13% of their provisioned capacity, memory utilization was at the level of 20%. This shows a huge opportunity for future optimizations.

#news

Читать полностью…

TechLead Bits

14 Nov 2024 03:38

Columnar Databases

Traditional databases store data in a row-oriented approach that is optimized for transactional, single-entity data lookup. But if you need to aggregate data by a specific column, the system has to read all columns from disk, which slows down query performance and increase resource usage.
To solve the issue, columnar databases was introduced.

Columnar database is a type of a database that stores data in columns together on the disk.

Imagine the following sample:

|Account|LastName|FirstName|Purchase,$|
| 0122  | Jones  | Jason   | 325.5    |
| 0123  | Diamond| Richard | 500      |
| 0124  | Tailor | Alice   | 125      |

In row-database it will be stored as following:

   0122, Jones, Jason, 325.5;   
   0123, Diamond, Richard, 500; 
   0124,  Tailor, Alice, 125;

In column-database:

   0122, 0123, 0124;
   Jones, Diamond, Tailor;
   Jason, Richard, Alice;
   325.5, 500, 125;

Benefits of the approach:
📍High data compression due to the similarity of data within a column
📍Enhanced querying and aggregation performance for in analytical and reporting tasks
📍Reduced I/O load as there is no need to process irrelevant data

The most popular columnar databases:
1. Amazon Redshift
2. Google Cloud BigTable
3. Microsoft Azure Cosmos DB
4. Apache Druid
5. Vertica
6. ClickHouse
7. Snowflake Data Cloud

Columnar databases are well-suited for building data warehouse, real-time analytics, statistics, storing and aggregating time-series data.

#engineering

Читать полностью…

TechLead Bits

07 Nov 2024 14:08

Google ARM Processor

Last week, Google announced their own custom ARM-based processor for general-purpose workloads. They promised up to 65% better price-performance and up to 60% better energy-efficiency.

Why is it interesting? Until now, only AWS offered a custom cost-optimized ARM processor - AWS Graviton. And now Google joined the competition. This shows that interest in ARM processors still grows and continue to grow in the future.

From engineering perspective, it's not possible just to switch workload from one architecture to another as images need to be pre-built for a specific architecture. One of the ways to test ARM nodes and migrate smoothly on the new architecture is by using multi-architecture images (I wrote about that here)

#engineering #news

Читать полностью…

TechLead Bits

08 Jan 2025 18:51

Inspired: How to Create Tech Products Customers Love

"Developing a product mindset is essential for any engineer, and it becomes mandatory when aiming for an engineering leadership role." I read that in Hybrid Hacker newsletter about engineering leader roadmap some time ago.

So I decided to improve my product mindset and understand what product managers really do 😉 with one of the most popular book in that area - Inspired: How to Create Tech Products Customers Love by Marty Cagan.

Key thoughts from the book:
✏️ Product teams should have enough autonomy and wide set of responsibilities. Teams should have the ownership of the product they develop.
✏️ Product should have an inspiring vision. Vision is what business wants to achieve.
✏️ Product should have a strategy. A strategy describes how vision will be achieved.
✏️ Product teams should have correct business context to make right decisions.
✏️ Teams should use fast and cheap prototypes to test business ideas.
✏️ If the company uses OKRs, then they should be set for teams not for individuals.
✏️ Сontinuous product research and innovation are the foundation for the business success.

Principles for a good product vision:
✔️ Always start with Why. Define the main product goal.
✔️ Focus on the problem not the solution.
✔️ Be ambitious.
✔️ Be ready to break something old to build something new.
✔️ Be inspirational. Your vision should excite people, they should want to be a part of the product.
✔️ Be aligned with current industry trends.
✔️ Focus on the future.
✔️ Allow flexibility in details.
✔️ Require extra efforts to achieve. If it's easy to achieve the vision, it may be not ambition enough.
✔️ Evangelism. Regularly explain product vision to the teams and stakeholders.

Principles for a good product strategy:
✔️ Focus on one user profile or business niche at a time.
✔️ Align with the overall company's strategy.
✔️ Align with the marketing strategy.
✔️ Focus on customers not competitors
✔️ Share the strategy with the teams.

The principles and recommendations about product vision and strategy were the most valuable parts of the book for me. Other chapters felt a bit trivial - like using prototypes, making them cheap and fast, not doing real product from MVP, knowing your customers, etc. Anyway, the book is really good to improve understanding of product management.

#booknook #product #leadership

Читать полностью…

TechLead Bits

31 Dec 2024 02:57

Happy New Year!

Another year has come to an end.

For me, it was a heavy and challenging year. But without a doubt, I’ve learned a lot, grown, and I am ready to move forward.

I wish you health and inner harmony. Take care of yourself, your family, your friends, and maintain a balance between work and personal life.

May the coming year be better than the last! Dream, plan, achieve, learn, experiment, love, move forward and enjoy every moment!

Happy New Year, Friends!🎄🎄🎄

Читать полностью…

TechLead Bits

30 Dec 2024 05:38

Embeddings

In the Data Vectorization post, we learned that ML algorithms work with feature vectors. For example, we had a vector for car colors, where a blue car was represented as (0.0, 1.0, 0.0, 0.0, 0.0).

Imagine we create feature vectors for meal items in a dataset with 5,000 different elements. Each vector would have 5,000 elements, all set to 0.0 except for one element set to 1.0. This approach would require a high number of weights, a lot of memory and computational resources, making the model inefficient and hard to maintain.

To optimize this embedding techniques are used. Embedding is a projection of high-dimensional space of initial data vectors into a lower-dimensional space.

For example, in the meal dataset, we could introduce a feature like "sandwichness" and evaluate how likely an item is a sandwich. A sandwich might have a score 0.99, shawarma 0.9, and soup 0.0. But one feature isn’t enough, so we could add other dimensions, like "dessertness", "vegannes" or "liquidness," to better describe each item.

With features like sandwichness, dessertness, and liquidness, the vector for a hotdog might look like (0.95, 0.8, 0.0). Real-world models use many dimensions, but these vectors are much shorter and more efficient than the original 5,000-element vectors with only 0.0 and 1.0 values.

ML practitioners select dimensions based on the task they want to solve. This means that embeddings for the same items can be different depending on the context, task and provided data.

References:
- What are Embeddings in ML?
- ML Google Crash Course: Embeddings

#aibasics

Читать полностью…

TechLead Bits

23 Dec 2024 06:45

Data Vectorization

Let’s talk about one of the most fascinating parts of machine learning - data vectorization.

At first look, it seems like ML models work directly with the raw data we provide. But actually, it doesn't. ML algorithms need a numerical representation of data. Specifically, they require data in the form of floating-point values called feature vector.

However, many features are naturally strings or other non-numerical values. The task is to transform these non-numerical values into numerical ones. That's the main purpose of feature engineering discipline.

Let me illustrate that on vectorization of categorical data.

Categorical data consists of specific, predefined values, like car colors, animal species, days of the week, or city street names. It can be low-dimensional (few possible values) or high-dimensional (many possible values).

For low-dimensional data, we can encode it as a vocabulary. Let’s use car colors as an example with 5 categories for simplicity: white, blue, red, black, and others (for any color not in the list).

To create a vector the following steps should be done:
1. Index each value:
- 0: white
- 1: blue
- 2: red
- 3: black
- 4: others

2. Represent each category as a vector (array) of N elements, where N is the number of categories.
|Feature|White|Blue|Red|Black|Others|
|------------|---------|-------|------|--------|---------|
|White | 1 | 0 | 0 | 0 | 0 |
|Blue | 0 | 1 | 0 | 0 | 0 |
|Red | 0 | 0 | 1 | 0 | 0 |
|Black | 0 | 0 | 0 | 1 | 0 |
|Others | 0 | 0 | 0 | 0 | 1 |

3. Convert to floating point values: replace 1 with 1.0 and 0 with 0.0. For example, the vector for blue would be
(0.0, 1.0, 0.0, 0.0, 0.0)

Of course, that's very basic example to illustrate the concept. Real-world cases often involve more complex transformations and math models, depending on the data and problem.

More details and vectorization strategies:
- Working with numerical data
- Working with categorical data
- Datasets, generalization, and overfitting

#aibasics

Читать полностью…

TechLead Bits

16 Dec 2024 16:10

DORA 2024 Report

Last time I wrote about DORA Key Delivery Metrics and today I want to share key trends from DORA 2024 State of DevOps Report:

✏️ Artificial Intelligence (AI) Adoption. Report shows increasing AI adoption especially for the following tasks:
- Writing code
- Summarizing information
- Explaining unfamiliar code
- Optimizing code
- Documenting code
- Writing tests
- Debugging code
- Data analysis
While AI boosts individual productivity, it has been also linked to a 1.5% reduction in delivery throughput and a 7.2% decrease in delivery stability. It can be explained that teams can produce larger changelists , which in turn increases the complexity of deployments and the risk of failure.

✏️ Platform Engineering. Platform engineering has become a critical discipline for high-performing teams: teams that leverage internal developer platforms saw a 10% increase in team performance and an 8% boost in individual productivity. But at the same time there is 14% decrease in change stability. So platform engineering needs to be carefully implemented to avoid increasing overall pipelines complexity and stability.

✏️ Developer Experience. Report shows that focusing on the user increases productivity and job satisfaction, while reducing the risk of burnout. When this focus on the user is combined with an environment of internal documentation quality, this increase in product performance is amplified.

✏️ Organizational Stability. Prioritizing stability in both technical and operational decisions can lead to higher team productivity and lower burnout.

✏️ Transformational Leadership. Transformational leadership is a model in which leaders inspire and motivate employees to achieve higher performance. A 25% increase in transformational leadership leads to 9% rise in productivity, reduced burnout and improved team and product performance. These leaders encourage their teams through the following dimensions:
- Vision. They have a clear vision of where their team and the organization are going.
- Inspirational Communication. They say positive things about the team; make employees proud to be a part of their organization.
- Intellectual Stimulation. They challenge team members to think about old problems in new ways and to rethink some of their basic assumptions about their work.
- Supportive Leadership. They consider others’ personal feelings before acting; behave in a manner which is thoughtful of others’ personal needs.
- Personal Recognition. They commend team members when they do a above an average job; acknowledge improvement in quality of team members' work.

The report highlights that transformation isn’t a one-time achievement but an ongoing process. Companies that are not continuously improving are actually falling behind. Companies that adopt a mindset of continuous improvement see the highest levels of success.

#engineering #news

Читать полностью…

TechLead Bits

12 Dec 2024 16:52

DORA: Measuring Delivery Performance

If you're interested in understanding how to measure the quality of your software delivery processes, then you've probably heard about DORA. DORA is a DevOps Research and Assessment project from Google that studies what helps teams to improve software delivery and operations performance.

DORA identified four software delivery metrics:
✏️ Deployment Frequency: How often an organization successfully releases to production.
✏️ Lead Time for Changes: The amount of time it takes a commit to get into production.
✏️ Change Failure Percentage: The percentage of deployments that cause failures in production and require hotfixes or rollbacks.
✏️ Failed Deployment Recovery Time: The time it takes to recover from a failed deployment. A lower recovery time indicates a more resilient and responsive system.

DORA’s research demonstrated that speed and stability are not tradeoffs, these metrics are correlated for most teams: top performers do well across all four metrics. The challenge is often to collect fragmented data across different devops tools, but there are some open-source solutions that can simplify the process.

The main pitfall in using DORA's delivery metrics is to set those metrics as the main goal of the teams work. Instead, think of them as a way to measure progress and guide the improvements.

#engineering #devops #delivery

Читать полностью…

TechLead Bits

11 Dec 2024 05:53

Binary Data Classification

In the previous #aibasics post, I briefly explained the basics of machine learning with Linear Regression. Today let's talk about another type of task - binary data classification. Typical example is determining whether an email is spam or not spam.

Key steps for binary classification:

1. Predict Probability. Take a Logistic Regression model that predict probability (mathematically it returns values between 0 and 1). For example, the probability of an input email being either spam or not spam. If the model predicts 0.72, this means there is a 72% chance the email is spam and 28% chance the email is not spam.

2. Set a Classification Threshold . The classification threshold determines how to assign a binary label (e.g., spam or not spam) based on the predicted probability. For example, the model predicts that a given email has a 75% chance of being spam. Does it mean the email is spam? Actually, no. If the threshold is set at 0.8, then email will be classified as not spam.

3. Evaluate the Model Using a Confusion Matrix. To measure how good our model is, we need to summarize the number of correct and incorrect predictions using confusion matrix:
- True Positive (TP): Correctly predicted positive cases.
- False Negative (FN): Positive cases incorrectly predicted as negative.
- False Positive (FP): Negative cases incorrectly predicted as positive.
- True Negative (TN): Correctly predicted negative cases.

4. Measure Classification Quality. The following metrics are used to define the effectiveness of the result model:
- Accuracy. The proportion of all classifications that were correct, whether positive or negative.
- Recall. The proportion of all actual positives that were classified correctly as positives.
- False Positive Rate. The proportion of all actual negatives that were classified incorrectly as positives.
- Precision. The proportion of all the model's positive classifications that are actually positive

The classification threshold and quality metrics should be adjusted based on the cost of errors for particular domain. If marking important emails as spam is costly, you may increase the threshold to reduce false positives. Conversely, if missing spam emails is more problematic, you may lower the threshold to prioritize catching them.

References:
- Google ML Course: Logistic Regression
- Google ML Course: Classification
- Confusion matrix in machine learning

#aibasics

Читать полностью…

TechLead Bits

07 Dec 2024 16:54

Today, I’m starting a topic with a picture first.

That's a D. Caruso Mood Map, a tool widely used in emotional intelligence techniques. The tool maps all our emotions on the grid with two scales:
- One scale is for a level of energy (low to high).
- The other scale is for a level of pleasantness (unpleasant to pleasant).

Explanation of how to use that will be shared in the next post 👇

#softskills #leadership

Читать полностью…

TechLead Bits

02 Dec 2024 17:25

Minimum Viable Architecture

There is no one-size-fits-all architecture for all scales for all project phases. Architecture should evolve with the product and it should be adopted to the requirements at different stages of product lifecycle.

That's the main idea from Randy Shoup talk - Minimum Viable Architecture. He calls this approach “just enough architecture”- the architecture that's good enough for the product to be released at current project stage.

Product Stages and Their Architecture:

📍 Prototyping.
- Goal: proof business concept, test the market and acquire first customers.
- Rapid iterations, a lot of prototyping.
- Technology doesn't matter, use any tools that get results fast.
- No architecture
- Single team

📍 Starting.
- Goal: solve customer needs as cheap as possible, acquire more customers.
- Rapid learning and iterations.
- Use simple, familiar tech stack
- Typically monolith architecture with a single database
- Rely on cloud infrastructure and open-source tools.
- Focus on competency growth, outsource everything else.
- Number of teams grows.

📍 Scaling.
- Goal: stay ahead of rapidly growing business.
- Time to rearchitect: "Getting to rearchitect a system is a sign of success, not failure."
- Build scalable architecture, focus on latency and performance
- Perform migration from monolith to microservices
- Scale team numbers

📍 Optimizing.
- Goal: make a system more sustainable, efficient and effective.
- Focus on small, incremental improvements.
- No major architectural changes.
- Improve operational efficiency.
- Consolidate the teams

I like the idea of matching architecture to business priorities and not overcomplicating the solution on early stages. The talk also shares some tips when rearchitecturing is really needed and how to do it without breaking the existing solution. Some ideas and recommendations about architecture looks too dogmatic for me, but overall the talk is really good ad I recommend to check the full video.

#architecture

Читать полностью…

TechLead Bits

28 Nov 2024 17:17

Linear Regression

Linear regression is the simplest supervised ML model that finds relationships between features and labels.

Mathematically it looks like:

 y'=b+w1*x1  + w2*x2 + ... + wn*xn

where
- y' - predicted value
- b - bias (calculated during training)
- wn - weight for a feature (calculated during training)
- xn - feature value (input to the model)

Loss for that type of model is usually calculated as a mean squared error(MSE) or mean absolute error (MAE):
- MSE is sensitive to outliers and adjusts the model toward them.
- MAE minimizes the absolute differences, making it less sensitive to outliers.

Training steps:
1. Calculate the loss with the current weight and bias.
2. Determine the direction to move the weights and bias that reduce loss.
3. Move the weight and bias values a small amount in the direction that reduces loss.
4. Return to step one and repeat the process until the model can't reduce the loss any further.

Example:
The model needs to predict taxi ride prices based on features like distance and ride duration. Past ride prices can be used as labels.

The model formula:

y'=b+w1*distance  + w2*ride_duration

The goal is to find values for b, w1, and w2 that minimize the MSE for the given labels. A well-trained model should converge after limited number of iterations, where the loss cannot be optimized anymore.

Use Cases:
✏️ Predicting Outcomes. Forecast values based on multiple inputs, e.g., taxi fares, apartment rentals, or flight prices.
✏️ Discovering Relationships. Reveal how variables are related and how changes in one variable affect the whole result.
✏️ Processes Optimizations. Optimize processes by understanding the relationships between different factors.

Studying linear regression made me realize why I learned linear algebra and statistics at university 😄. I really had some fun with the math and dynamic examples.

References:
- Google ML Crash Course: Linear Regression
- Understanding Multiple Linear Regression in ML

#aibasics

Читать полностью…

TechLead Bits

28 Nov 2024 17:02

ML Basic Terms

To be on the same page with AI-experts we need to build a special vocabulary with basic terms and concepts:
✏️ Feature - input parameter for the model. Usually it represents some characteristic of the entity or facts for which the model makes prediction.
✏️ Label - existing answer for input data. Usually used to train supervised models: predicted value can be compared with labels to check the size of discrepancy.
✏️ Loss - the difference between predicted value and label. For different models different functions to calculate loss is used.
✏️ Learning Rate - a floating-point number that tells the optimization algorithm the step size for the iteration while moving toward a minimum of a loss function. If the learning rate is too low, the model can take a long time to converge. If the learning rate is too high, the model may never converge.

#aibasics

Читать полностью…

TechLead Bits

25 Nov 2024 17:59

I'm introducing a new section on the channel: #aibasics !

Over the past two years, ML is the top trend in the industry with the huge interest not just in tech but across various business domains. ML helps to automate routine tasks and significantly decrease operational costs. And definitely this trend will continue to grow next few years or even more.

As engineers we should at least know the fundamentals of that technology. I mean not just using lots of GenAI tools in daily work but understanding how it works under the hood, its limitations, capabilities and applicability for business and engineering tasks. As of me, I have a significant knowledge gap here, which I plan to close next several months.

I plan to start with the following courses (they are absolutely free):
✏️ Mashing Learning Crash Course from Google that has fresh updates in November
✏️ LLM Course by Cohere

I will use those courses as a base and extend them with additional sources on demand.

So I'm starting my AI learning journey and will share my progress and key takeaways here 💪

Читать полностью…

TechLead Bits

19 Nov 2024 03:33

Manage Your Energy Level

Recently I wrote about the importance of having high-quality vacations. What I didn’t share is that I went on vacation completely drained, with zero level of internal resources and even a diagnosis from a neurologist 😵‍💫. It is a tough state to be in, I never want to feel like that again.

So I reflected on how to prevent burning out in the future.

First of all, I understand that's my fault - not heavy work, urgent issues, or company changes. It's primary responsibility of any leader to support their internal resource and energy. That's very important. Leader cannot work without enough energy level, as it's not possible to drive anything or meet business goals in that state.

Next, I started to study different recommendations what to do. The advice is usually very common: exercise, walk, eat well, and have time for hobbies. Unfortunately, I already knew that, but it didn't help me. My issue is that I don't notice the point where I am completely drained and it's too late to go for a walk.

So I need to control internal state somehow. As technical people, we know that to control something we need to measure something. One resource recommends Welltory app that makes personal health analysis based on heart rate variability (Garmin watches have similar features already built-in). Additionally it uses info about sleep, steps, stress level, and more from mostly any smart watch. Looks like a magic, but there is real science under that. This isn’t an ad—just sharing a tool I found useful 🙂.

I've been using the app for about 2 weeks now. The algorithm is still training (about 35% done), but I’m already using its basic features. I periodically make measurements and check overall state: green, orange or red. Based on this, I’ve started taking short recovery breaks at work to avoid hitting zero. Also I control overall health trend to understand if my daily routine needs additional corrections like more exercise, walk, etc.

Burnout is very common problem in our industry that's why I decided to share my experience on what can be helpful to control internal state and support good level of motivation and energy. Of course, 2 weeks are not enough to say the approach works. Put likes if the topic is interesting and I'll share my results in 1-2 months.

Stay healthy and take care of yourself.

#softskills #productivity

Читать полностью…

TechLead Bits

11 Nov 2024 03:37

Uber’s Gen AI On-Call Copilot

GenAI continues its march in routine automation. This time Uber shared their experience with Genie - on-call support automation for internal teams.

The issue is very common for large companies with many teams: there is some channels (for Uber, it's slack with ~45 000 questions per month) where teams can put questions and request help with the service or technology. Of course, there are a lot of docs and relevant articles, but they are fragmented and spread across internal resources. It's really hard for users to find answers on their own. As a result, the number of repetitive questions grows, load and demand on support engineers increase.

Key elements of implemented solution:
✏️ RAG (Retrieval-Augmented Generation) Approach to work with LLM
✏️ Data Pipeline: Information from wikis, internal Stack Overflow, and engineering docs is scraped daily, transformed into vectors, and stored in an in-house vector database with the source links. Data pipeline is implemented on Apache Spark.
✏️ Knowledge Service: When a user posts a question in Slack, Genie’s backend converts it into a vector and fetches the most relevant chunks from the vector database.
✏️ User Feedback: Users can rank answers as Resolved, Helpful, Not Helpful, or Relevant, these ratings are used to analyze answer quality.
✏️ Source Quality Improvements: There is a separate evaluation process to improve source data quality. The LLM performs docs analysis and returns an evaluation score, explanations of the score and actionable suggestions to improve. All these information is collected to an evaluation report for further analysis and fixes.

Since Genie’s launch in September 2023, Uber reports it has answered 70,000 questions with 48.9% helpfulness rate, saving 13 000 engineering hours😲. It's impressive! I definitely want to have something similar at my work. Just a small hurdle left—get the budget and resources for implementation. No big deal, right? 😉

#engineering #usecase #ai

Читать полностью…

TechLead Bits

06 Nov 2024 15:38

Take a Vacation

Last week I was on vacation, so there was a little break in the publications😌. Therefore I would like to talk a little about the vacation and how important it is. High quality vacation is not just opportunity for relax but it is also a prevention mechanism for many serious diseases.

But it’s not enough just to take vacations regularly; the way you spend them determines if you re-charge your internal battery or not.

My tips for a good vacation:

✏️ Take Enough Time: Ideally, a vacation length should be at least 14 days (as a single period). If you feel heavily exhausted, then better to take 21 days. That time is usually enough to recharge.
✏️ Change the Scenery: Travelling to a new place (even a short trip) gives you new impressions, experience, fill you with new ideas, inspiration and energy. Spending time outside standard surroundings significantly decreases an overall strain level. The fact is also proved by German researchers.
✏️ Digital Detox: Don't touch your laptop, don't open working chats, don't read the news, minimize social networks usage. Give the rest to your brain from constant information noise.
✏️ Be Spontaneous: Don't try to plan everything: constant following the schedule makes vacation feel more work-like and doesn't allow to enjoy the moment. Spontaneous activities can provide more fun and satisfaction.
✏️ Do Nothing: Allow yourself to take time for idleness. That's really difficult as you feel just wasting time that can be spend more effectively😀. But that's the trick: state of nothingness rewires the brain, improve creativity and problem solving capabilities.

So take care of yourself and plan a proper rest during the year.

#softskills #productivity

Читать полностью…

Subscribe to a channel